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ISOLATED HUMAN PROTEASE PROTEINS, NUCLEIC ACID MOLECULES 
ENCODING HUMAN PROTEASE PROTEINS, AND USES THEREOF 



RELATED APPLICATIONS 

The present ^iplication claims priority to provisional application U.S. Serial No. 
60/235,557, filed September 27, 2000 (Atty. Docket CL000862-PROV) and U.S. Serial No. 
09/734,675, filed December 13, 2000 (Atty. Docket CL000862). 

FIELD OF THE INVENTION 

The present invention is in the field of protease proteins that are related to the serine 
protease subfamily, recombinant DNA molecules, and protein production. The present invention 
specifically provides novel peptides and proteins that effect protein cleavage/processing/tumover 
and nucleic acid molecules encoding such peptide and protein molecules, all of which are usefiil 
in the development of human thers^eutics and diagnostic compositions and methods. 

BACKGROUND OF THE INVENTION 

The proteases may be categorized into &milies by the different amino acid sequences 
(generally between 2 and 10 residues) located on either side of the cleavage site of the protease. 

The proper fimctioning of ttie cell requires careful control of the levels of important 
structural protems, enzymes, and regulatory proteins. One of the ways that cells can reduce the 
steady state level of a particular protein is by proteolytic degradation. Further, one of the ways 
cells produce functioning proteins is to produce pre or pro-protein precursors that are processed 
by proteolytic degradation to produce an active moiety. Thus, complex and hi^y-regulated 
mechanisms have been evolved to accomplish this degradatioiL 

Proteases regulate many diJBferent cell proliferation, differentiation, and signaling 
processes by regulating protein turnover and processing. Uncontrolled protease activity (either 
increased or decreased) has been inqilicated in a variety of disease conditions including 
inflammation, cancer, arteriosclerosis, and degenerative disorders. 

An additional role of intracellular proteolysis is in the stress-response. Cells that are 
subject to stress such as starvation, heat-shock, chemical insult or mutation respond by 
increasing the rates of proteolysis. One function of tiiis enhanced proteolysis is to salvage amino 
acids fixmi non-essential proteins. These amino acids can then be re-utilized in the synthesis of 
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essential protems or metabolized directly to provide energy. Another function is in the repair of 
damage caused by the stress. For example, oxidative stress has been shown to damage a variety 
of proteins and cause them to be rapidly degraded. 

The International Union of Biochemistry and Molecular Biology (lUBMB) has 
5 recommended to use the term peptidase for the subset of pq)tide bond hydrolases ( Subclass E.C 
3,4.). The widely used term protease is synonymous with peptidase. Peptidases comprise two 
groiq>s of enzymes: the endopeptidases and the exopeptidases, which cleave peptide bonds at 
points within the protein and remove amino adds sequentially from either N or C-terminus 
respectively. The term proteinase is also used as a synonym word for endopeptidase and fotir 

10 mechanistic classes of proteinases are recognized by the lUBMB: two of these are described 
below (also see: Handbook of Proteolytic Enzymes by Barrett, Rawlings, and Woessner AP 
Press, NY 1998). Also, for a review of flie various uses of proteases as drug targets, see: Weber 
M, Emerging treatments for hypertension: potential role for vasopeptidase inhibition; Am J 
HypCTtens 1999 Nov;12(l 1 Ft 2):139S-147S; Kentsch M, Otter W, Novel neurohonnonal 

15 modulators in cardiovascular disorders. The ther^>eutic potential of endopq^tidase inhibitors. 
Drugs R D 1999 Apr;l(4):33 1-8; Scarborough RM, Coagulation &ctor Xa: the prothrombinase 
coinplex as an emerging ther^eutic target for small molecule inhibitors, J Enzym Inhib 
1998; 14(1): 15-25; Skotnidd JS, et al.. Design and synthetic considerations of matrix 
metalloproteinase inhibitors, Ann N Y Acad Sci 1999 Jun 30;878:61-72; McKerrow JH, Engel 

20 JC, Cafifrey CR, Cysteine protease iiihibitors as chemother^y for parasitic infections, Bioorg 
Med Chem 1999 Apr;7(4):639-44; Rice KD, Tanaka RD, Katz BA, Numerof RP, Moore WR, 
Inhibitors of tryptase for tiie treatment of mast cell-mediated diseases, Curr Phaim Des 1998 
Oct;4(5):381-96; Materson BJ, ^Will angiotensin converting enzyme genotype, receptor mutation 
identification, and other miracles of molecular biology permit reduction of NNT Am J Hypertens 

25 1998Aug;ll(8Pt2):138S-142S 

Serine Proteases 

The serine proteases (SP) are a large &mily of proteolytic enzymes Aat include the 
digestive enzymes, trypsin and cfaymotrypsin, components of the complement cascade and of the 
30 blood-clotting cascade, and enzymes that control the degradation and turnover of 

macromolecules of the extracellular matrix. SP are so named because of the presence of a serine 
residue in the active catalytic site for protein cleavage. SP have a wide range of substrate 
specificities and can be subdivided into sub&milies on the basis of these specificities. The main 



wo 02A26947 FCT/USO 1/29960 

sub-families are trypases (cleavage after arginine or lysine), aspases (cleavage after aspartate), 
chymases (cleavage after phenylalanine or leucine), metases (cleavage after methionine), and 
serases (cleavage after serine). 

A series of six SP have been identified in murine cytotoxic T-lymphocytes (CTL) and 
5 natural killer (NK) cells. These SP are involved with CTL and NK cells in the destruction of 
vitally transfoimed cells and tumor cells and in organ and tissue transplant rejection (Zimino, S. 
J. et al. (1990) J. hnmimoL 144:2001-9; Sayers, T. J. et al. (1994) J. ImmunoL 152:2289-97). 
Human homologs of most of these enzymes have been identified (Trapani, J. A. et al. (1988) 
Proc. Natl. Acad. Sd. 85:6924-28; Caputo, A. et al. (1990) J. Immunol. 145:737-44). like aD 
10 SP, the CTL-SP share three distinguishing features: 1) the presence of a catalytic triad of 

histidine, serine, and aspartate residues which comprise the active site; 2) the sequence GDSGGP 
which contains the active site serine; and 3) an N-temiinal SGG sequence which characterizes 
the mature SP. 

The SP are secretory proteins which contain N-termmal signal peptides that serve to 

15 export the immature protein across the endoplasmic reticulum and are then cleaved (von Heijne 
(1986) Nuc. Add. Res. 14:5683-90). Differences in &ese signal sequences provide one means of 
distinguishing individual SP. Some SP, particularly the digestive enzymes, exist as inactive 
precursors or preproenzymes, and contain a leader or activation peptide sequence 3* of the signal 
peptide. This activation peptide may be 2-12 amino acids in length, and it extends from the 

20 cleavage site of the signal pq)tide to the N-teiminal HGG sequence of the active, mature protein. 
Qeavage of this sequence activates the enzyme. This sequence varies in different SP according 
to the biochemical pathway and/or its substrate (Zunino et al, siq^ra; Sayers et al, supra). Other 
features that distinguish various SP are the presence or absence of N-linked glycosylarion sites 
that provide membrane anchors, tiie number and distribution of cysteine residues that determine 

25 the secondary structure of the SP, and the sequence of a substrate binding sites such as S*. The S' 
substrate binding region is defined by residues extending from ^^proximately +1 7 to +29 relative 
to the N-terminal I (+1). Differences in this region of the molecule are believed to determine SP 
substrate specificities (Zunino et al, si^ra). 

Trypsin-Cke serine p^roteases have been isolated from patients with chronic airway 

30 diseases and may play a role in respiratory diseases and host defense systems on the mucous 
membranes of the respiratory system (see Yamaoka et al., J, BioL Chem. 273: 1 1895-1 1901, 
1998 and Yasuoka et al. Am. Jl Resp. CellMolec, BioL 16: 300-308, 1997). Therefore, novel 
human serine protease proteins, and encoding genes, may be useful for screening for, diagnosing. 
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preventing, and/or treating disorders such as respiratory diseases. For example, serine protease 
genes/proteins may be useful in drug development, such as by serving as novel drug targets for 
respiratory disease, and SNPs in s^ine protease genes may be useful markers for diagnostic kits 
for respiratory diseases. 

Trvpsinogens 

The trypsinogens are serine proteases secreted by exocrine cells of the pancreas (Travis J 
and Roberts R. Biochemistry 1969; 8: 2884-9; Mallory P and Travis J, Biochemistry 1973; 12: 
2847-51). Two major types of trypsinogen isoenzymes have been characterized, trypsinogen-1, 
also called cationic trypsinogen, and trypsinogen-2 or anionic trypsinogen. The trypsinogen 
proenzymes are activated to trypsins in the intestine by enterokinase, which removes an 
activation peptide fiom the N-terminus of the trypsinogens. The trypsinogens show a high degree 
of sequmce homology, but they can be separated on the basis of charge differences by using 
electrophoresis or ion exchange chmmatogr^hy. The major form of trypsinogen in the pancreas 
and pancreatic juice is trypsinogen-l (Guy CO et aL, Biochem Biophys Res Commun 1984; 125: 
516-23). In serum of healthy subjects, trypsinogen-1 is also the major form, whereas in patients 
with pancreatitis, trypsinogen-2 is more strongly elevated (Itkonen et aL, J Lab Clin Med 1990; 
1 15:712-8). Trypsinogens also occur in certain ovarian tumors, in which trypsinogen-2 is the 
major form (Koivunen et al.. Cancer Res 1990; 50: 2375-8). Trypsin-1 in complex with alpha-1- 
antitiypsin, also called alpha-l-antiprotease, has been found to occur in serum of patients with 
pancreatitis (Borgstrom A and Ohlsson K, Scand J Clin Lab Invest 1984; 44: 381-6) but 
determination of this complex has not been found useful for difTerentiation between pancreatic 
and other gastrointestinal diseases (Borgstrom et al., Scand J Clin Lab Livest 1989; 49:757-62). 

Tiypsmogen-1 and -2 are closely related immunologically (Kimland et al., Clin Chim 
Acta 1989; 184: 31-46; Itkonen et al., 1990), but by using monoclonal antibodies (Itkonen et al., 
1990) or by absorbing polyclonal antisera (Kimland et al., 1989) it is possible to obtain reagents 
enabling specific measuronent of each form of trypsinogen. 

When active trypsin reaches the blood stream, it is inactivated by the major trypsin 
inhibitors alpha-2-macroglobulin and alpha-1 -antitrypsin (AAT). AAT is a 58 kilodalton serine 
protease inhibitor synthesized in the liver and is one of the main protease inhibitors in blood. 
Whereas conq>lexes between trypsin-1 and AAT are detectable in serum (Borgstrom and 
Ohlsson, 1984) the conqilexes with alpha -2-macroglobulin are not measurable with antibody- 
based assays (Ohlsson K, Acta Gastroenterol Belg 1988; 51: 3-12). 

4 
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Inflammation of the pancreas or pancreatitis may be classified as either acute or chronic 
by clinical criteria. Witti treatment, acute pancreatitis can often be cured and normal function 
restored. Chronic pancreatitis often results in permanmt damage. The precise mechanisms which 
trigger acute inflammation are not understood. However, some causes in the order of their 
importance are alcohol ingestion, biliary tract disease, post-operative traimia, and hereditary 
pancreatitis. One tiheory provides that autodigestion, the premature activation of proteolytic 
enzymes in the pancreas rather than in the duodenum, causes acute pancreatitis. Any number of 
other &ctors including endotoxins, exotoxins, viral infections, ischemia, anoxia, and direct 
trauma may activate the proenzymes. In addition, any internal or external blockage of pancreatic 
ducts can also cause an accumulation of pancreatic juices in the pancreas resulting cellular 
damage. 

Anatomy, physiology, and diseases of the pancreas are reviewed, inter alia, in Guyton 
AC (1991) Textbook of Medical Physiology, W B Saunders Co, Philadelphia Pa.; Isselbacher K 
J et al (1994) Harrison's Princqiles of hitemal Medicme, McQraw-HiD, New York City; Johnson 
K E (1991) Histology and Cell Biology, Harwal Publishing, Media Pa.; and The Merok Manual 
of Diagnosis and Ther^y (1992) Merck Research Laboratories, Rahway N.J. 

Metalloprotease 

The metalloproteases may be one of the older classes of proteinases and are found in 
bacteria, fungi as well as in higher organisms. They difTer widely in their sequences and their 
structures but the great majority of enzymes contain a zinc atom which is catalytically active. In 
some cases, zinc may be replaced by another metal such as cobalt or nickel without loss of the 
activity. Bacterial thezmolysin has been well characterized and its ciystallogr^hic structure 
indicates tiiat zinc is bound by two histidines and one glutamic add. Many enzymes contain the 
sequmce HEXXH, which provides two histidine ligands for the zinc whereas the tirird ligand is 
eitho" a glutamic acid (thennolysin, neprilysin, alanyl aminopeptidase) or a histidine (astacin). 
Other &milies exhibit a distinct mode of binding of the Zn atom. The catalytic mechanism leads 
to the formation of a non covalent tetrahedral intermediate after the attack of a zinc-bound water 
molecule on flie carbonyl groi^ of the scissile bond. This intennediate is further decon^x>sed by 
transfer of the glutamic acid proton to the leaving group. 

Metalloproteases contain a catalytic zinc metal cent^ v^ch participates in the hydrolysis 
' of the peptide backbone (reviewed in Power and Harper, in Protease Mubitors, A. J. Barrett and 
G. SalversCT (eds.) Elsevier, Amsterdam, 1986, p. 219). The active zinc center differentiates 

5 
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some of these proteases from calpaios and trypsins whose activities are dependent i^n the 
presence of caldinn. Examples of metalloproteases include caiboxypeptidase A, 
caiboxypeptidase B, and thermolysin. 

Metalloproteases have been isolated fix)m a nimiber of procaryotic and eucaryotic 
sources, e.g. Bacillus subtilis CMcConn et aL, 1964, J. Biol. Chem. 239:3706); Bacillus 
megaterium; Serratia (Miyata et al., 1971, Agr. Biol. Chem. 35:460); Clostridium bifermentans 
(MacFarlane et al., 1992, App. Environ. Microbiol. 58:1195-1200), Legionella pneumophila 
(Moffet et al., 1994, Infection and Immunity 62:751-3). In particular, acidic metalloproteases 
have been isolated fiom broad-banded copperhead venoms (Johnson and Ownby, 1993, Iht J. 
Biochem. 25:267-278), rattlesnake venoms (Chlou et al., 1992, Biochem. Biophys. Res. 
CommuTL 187:389-396) and articular cartilage (Treadwell et al., 1986, Arch. Biochem. Biophys. 
251:715-723). Neutral metalloproteases, specifically those having optimal activity at neutral pH 
have, for example, been isolated from Aspergillus sojae (Sddne, 1973, Agric. Biol. Chem. 
37:1945-1952). Neutral metalloproteases obtained fiom Aspei:gillu5 have been classified into 
two groins, npl and npR (Seildne, 1972, Agric. Biol. Chem. 36:207-216). So fer, success in 
obtaining amino acid sequence information from these fimgal neutral metalloproteases has been 
limited. An TsplI metalloprotease isolated fiom Aspergillus oryzae has been cloned based on 
amino acid sequence presented in the hterature (Tatsumi et al., 1991, MoL Gen. Genet 228:97- 
103). However, to date, no npl fungal metalloprotease has been cloned or sequenced. Alkaline 
metalloproteases, for example, have been isolated finom Pseudomonas aeruginosa (Baumarm et 
al., 1993, EMBO J 12:3357-3364) and the insect pathogen Xenorhabdus lummescens (Schmidt 
et al., 1998, Appl. Environ. Microbiol, 54:2793-2797). 

Metallc^teases have been devided into several distinct femihes based primarily on 
activity and sturctuie: 1) water nucleophile; water bound by single zinc ion ligated to two His 
(within the motif HEXXH) and Glu, His or Asp; 2) water nucleophile; water bound by smgle 
2dnc ion Ugated to His, Glu (within the motif HXXE) and His; 3) water nuclecphfle; water bound 
by single zinc ion Hgated to His, Asp and His; 4) Water nucleophile; water bound by single zinc 
ion Kgated to two His (within the motif HXXEET) and Glu and 5) water nucleophile; water bound 
by two zinc ions ligated by Lys, Asp, Asp, Asp, Glu. 

Examples of members of Ae metalloproteinase family include, but are not Umited to, 
membrane alanyl aminopeptidase (Homo salens), germinal peptidyl-dipeptidase A (Homo 
s^iens), thimet oligopqptidase (Rattus norvegicus), oligopeptidase F (Lactococcus lactis), 
mycolysin (Streptomyces cacaoi), immune inhibitor A (Bacillus thuringiensis), sn^alysin 
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(Streptomyces lividans), leishmanolysin (Ldshmania major), microbial coUagenase (Vibrio 
alginolyticus), microbial collagenase, class I (Clostridium perfiingens), coUagenase 1 (Homo 
sapiens), serralysin (Serratiamarcescens), fragilysin (Bacteroides fragilis), gametolysin 
(Chlamydomonas rebhardtii), astacin (Astacus fluviatilis), adamalysin (Crotalus adamanteus), 
5 ADAM 10 (Bos taunis), nq)rilysin (Homo salens), carboxypeptidase A (Homo sapiens), 
caiboxypeptidase E (Bos taurus), gamma-D-glutamyl-(L)-meso-diaminopimelate peptidase I 
(Bacillus sphaericus), vanY D-Ala-D-Ala caiboxypeptidase (Enterococcus feecium), endolysin 
(bacteriophage Al 1 8), pitrilysin (Escherichia coli), mitochondrial processing peptidase 
(Saccharomyces c^evisiae), leucyl aminopeptidase (Bos taurus), aminopeptidase I 
10 (Saccharomyces cerevisiae), membrane dipeptidase (Homo sapiens), glutamate carboxypeptidase 
(Pseudomonas sp.), Gly-X carboxypeptidase (Saccharomyces cerevisiae), O-sialogJycoprotein 
endop^tidase (Pasteurella haemolytica), beta-l3^c metalloendopeptidase (Achromobacter 
lyticus), methionyl aminopq>tidase I (Escherichia coli), X-Pro aminopeptidase (Escherichia 
coH), X-His dipeptidase (Bsdieridiia coli), IgAl -specific metalloendopeptidase (Streptococcus 

15 sanguis), tentoxilysin (Clostridium tetaxu), leucyl aminopeptidase (Vibrio proteolyticus), 

aminopq)tidase (Streptomyces griseus), lAP aminop^tidase (Escherichia coli), aminopeptidase 
T (Thenniis aquaticus), hyicolysin (St^hylococcus hyicus), carboxypeptidase Taq (Thennus 
aquaticus), anthrax lethal factor (Bacillus anthracis), pemdllolysin (PeniciUium citrinum), 
fimgalysin (Aspergillus fixmigatus), lysostaphin (Stq>hylococcus simulans), beta-aspartyl 

20 dipeptidase (Escherichia coli), carboxypeptidase Ssl (Sulfolobus sol&taricus), FtsH 

endopeptidase (Eschmchia coh), glutamyl aminopeptidase (Lactococcus lactis), cytophagalysin 
(Cytophaga sp.), metalloendopqptidase (vaccinia virus), VanX D-Ala-D-Ala dipeptidase 
(Enterococcus faedumX Ste24p endopeptidase (Saccharomyces cerevisiae), dip^tidyl-peptidase 
in (Rattus norvegicus), S2P protease (Homo s^iens), sporulation j&ctor SpoIVEB (Bacillus 

25 subtilis), and EHTBD endopeptidase (Escherichia coli). 

Metalloproteases have been found to have a number of uses. For example, there is strong 
evidence that a metalloprotease is involved in the in vivo proteolytic processing of the 
vasoconstrictor, endothelin-1. Rat metalloprotease has been found to be involved in peptide 
hormone processing. One inqyortant sub&mily of the metalloproteases are the matrix 

30 metalloproteases. 

A number of diseases are thought to be mediated by excess or undesired metalloprotease 
activity or by an imbalance in the ratio of the various members of the protease family of proteins. 
These include: a) osteoarthritis (Woessner, et al., J. BioLChem. 259(6), 3633, 1984; Phadke, et 
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al., J. Rheumatol. 10, 852, 1983), b) Aeumatoid artitritis (MuUins, et al., Biochim. Biophys. Acta 
695, 117, 1983; Woolley, et al.. Arthritis RheunaTM, 1231, 1977; GravaUese, et al.. Arthritis 
Rheum. 34, 1076, 1991), c) septic arthritis (Williams, et aL, Arthritis Rheum. 33, 533, 1990), d) 
tumor metastasis (Reich, et al.. Cancer Res. 48, 3307, 1988, and Matrisian, et al., Proc. Nat'l. 
5 Acad. ScL, USA 83, 9413, 1986), e) periodontal diseases (Overall, et al., J. Periodontal Res. 22, 
81, 1987), f) corneal ulceration (Bums, et al.. Invest Opthahnol. Vis. Sci. 30, 1569, 1989), g) 
proteinuria (Baricos, et al., Biochem. J. 254, 609, 1988), h) coronary tiirombosis from 
atherosclerotic plaque rupture (Henney, et al., Proc. Natl. Acad. Sci., USA 88, 8154-8158, 
1991), i) aneurysmal aortic disease (Vine, et al., Clin. Sci. 81, 233, 1991), j) birth control 
10 (Woessner, et al.. Steroids 54, 491, 1989), k) dystrophobic epidermolysis bullosa (Kronberger, et 
al., J. hivest Dermatol. 79, 208, 1982), and 1) degenerative cartilage loss following traumatic 
joint injury, m) conditions leading to inflammatory responses, osteopenias mediated by MMP 
activity, n) tenq)ero mandibular joint disease, o) demyelating diseases of the nervous system 
(Chantry, et al., J. Neurochem. 50, 688, 1988). 

15 

Aspartic protease 

Aspartic proteases have been divided into several distinct &mihes based primarily on 
activity and structure. These include 1) water nucleopbile; water bound by two Asp from 
monomCT or dimen all endopeptidases, frx>m eukaryote organisms, viruses or virus-like 
20 organisms and 2) endopeptidases that are water nucleophile and are water bound by Asp and 
Asn. 

Most of aspartic proteases belong to the pepsin family. The pepsin family includes 
digestive enzymes such as pepsin and chymosin as well as lysosomal cathepsins D and 
processing aozymes such as renin, and certain fungal proteases ^penicillopepsin, rhizopuspepsin, 

25 mdothiapq>sin). A second &mily comprises viral proteases such as the protease from the AIDS 
virus (BHV) also called letropepsin. Crystallogr^hic studies have shown that these enzymes are 
bilobed molecules with the active site located between two homologous lobes. Each lobe 
V contributes one aspartate residue of the catalytically active diad of aspartates. These two aspartyl 
residues are in close geometric proximity in the active molecule and one aspartate is ionized 

30 whereas the second one is unionized at the optimum pH range of 2-3. Retropepsins, are 

monomeric, i.e carry only one catalytic aspartate and then dimerization is required to form an 
active enzyme. 
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In contrast to serine and cysteine proteases, catalysis by aspartic protease do not involve a 
covalent intermediate thougji a tetrahedral intermediate exists. The nucleophilic attack is 
achieved by two simultaneous proton transfer: one &om a water molecule to the diad of the two 
carboxyl groups and a second one from the diad to the carbonyl oxygen of the substrate with the 
5 concuirent CO-NH bond cleavage. This general acid-base catalysis, which may be called a 

^'push-pull" mechanism leads to the formation of a non covalent neutral tetrahedral intermediate. 

Examples of the aspartic protease family of proteins include, but are not limited to, 
pepsin A (Homo s^iens), HIVl retropepsin (human immunodeficiency virus type 1), 
endopeptidase (cauliflower mosaic virus), bacilliform virus putative protease (rice tungro 

10 bacilliform virus), aspergillopq)sin n (Aspergillus niger), thermopsin (Sulfolobus 

addocaldarius), nodavirus endopeptidase (flock house virus), pseudomonapepsin (Pseudomonas 
sp. 101), signal peptidase n (Escherichia coli), polyprotein peptidase (human spumaretrovirus), 
copia transposon (£>rosophila melanogaster), SIRE-1 peptidase (Glycine max), retrotransposon 
bsl endopeptidase (Zea mays), retrotransposon peptidase (Drosophila buzzatii), Tas 

15 retrotransposon peptidase (Ascaiis lumbricoides), Pao retrotransposon peptidase (Bombyx mori), 
putative proteinase of Sktppy xetrotransposon (Fusarium oxysporum), tetravirus endopeptidase 
(Nudaurelia capensis omega virus), presenilin 1 (Homo sq)iens). 

Proteases and Cancer 

20 Proteases are critical elements at several stages in the progression of metastatic cancer. In 

this process, the proteolytic degradation of structural protein in the basal membrane allows for 
expansion of a tumor in the primary site, evasion from this site as well as homing and invasion in 
distant, secondary sites. Also, tumor induced angiogenesis is required for tumor growtii and is 
dependent on proteolytic tissue remodeling. Transfection experiments with various types of 

25 proteases have shown that the matrix metalloproteases play a dominant role in these processes in 
particular gelatinases A and B (MMP-2 and MMP-9, respectively). For an overview of this field 
see MuUins, et aL, Biochim. Biophys. Acta 695, 177, 1983; Ray, et al., Eur. Respir. J. 7, 2062, 
1994; Birkedal-Hansen, et al., Crit Rev. Oral BioL Med. 4, 197. 1993. 

Furtheimore, it was demonstrated that inhibition of degradation of extracellular matrix by 

30 the native matrix metalloprotease inhibitor TIMP-2 (a protein) arrests cancer growth (DeClerck, 
et al.. Cancer Res. 52, 701, 1992) and that TIMP-2 iohibits tumor-induced angiogenesis in 
experimental systems (Moses, et al. Science 248, 1408, 1990). For a review, see DeClerck, et aL, 
Aim. N. Y. Acad* Sci. 732, 222, 1994. It was furflier demonstrated that the synthetic matrix 
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metalloprotease inhibitor batimastat when given iatrE^eritoneally inhibits human colon tumor 
growth and spread in an orthotopic model in nude mice (Wang, et al. Cancer Res. 54, 4726, 
1994) and prolongs the survival of mice bearing human ovarian carcinoma xenografts (Davies, 
et al.. Cancer Res. 53, 2087, 1993). The use of this and related compounds has been described i 
Brown, et al., WO-9321942 A2. 

There are several patents and patent ^pUcations claiming the use of metallopioteinase 
inhibitors for the retardation of metastatic cancer, promoting tumor regression, inhibiting cancer 
cell proliferation, slowing or prevoiting cartilage loss associated with osteoarthritis or for 
treatment of other diseases as noted above (e.g. Levy, et aL, WO-9519965 Al; Beckett, et al., 
WO-9519956 Al; Beckett, et aL, WO-9519957 Al; Beckett, et al., W09519961 Al; Brown, et 
al., WO-9321942 A2; Crimmin, et aL, WO-9421625 Al; Dickens, et al., U.S. Pat No, 
4,599,361; Hughes, et al., U.S. Pat No. 5,190,937; Broadhurst, et aL, EP 574758 Al; 
Broadhurst, et al., EP 276436; and Myers, et al., EP 520573 Al. 



Protease proteins, particularly members of the serine sub&mily, are a m^or target for drug 
action and development Accordingly, it is valuable to the field of phamiaceutical development to 
identify and characterize previously unknown monbers of this sub&imily of protease proteins. The 
present invention advances the state of the art by providing a previously unidentified human 
protease proteins that have homology to members of the serine sub&mily. 

SUMMARY OF THE INVENTION 

The present invention is based in part on the identification of amino acid sequences of 
human protease peptides and proteins that are related to the serine protease subfamily, as well as 
allelic variants and other mammalian orthologs thereof. These unique peptide sequences, and 
nucleic acid sequences that encode these peptides, can be used as models for the development of 
human therq>eutic targets, aid in the idmtification of therapeutic proteins, and serve as targets 
fiir the development of human ther^eutic agents that modulate protease activity in cells and 
tissues that express flie protease. Experimental data as provided in Figure 1 indicates expression 
in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone marrow, and in 
cancers. 
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DESCRIPTION OF THE FIGURE SHEETS 

FIGURE 1 provides the nucleotide sequence of a cDNA molecule that encodes the 
protease protein of the present invention. (SEQ ED NO: 1) In addition, structure and functional 
infotmation is provided, such as ATG start, stop and tissue distribution, where available, that 
allows one to readily detennine specific uses of inventions based on this molecular sequence. 
Experimental data as provided in Figure 1 indicates expression in humans in testis, placenta, 
fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. 

FIGURE 2 provides the predicted amino acid sequence of the protease of the present 
invention. (SEQ ID NO:2) In addition structure and functional information such as protein 
family, function, and modification sites is provided where available, allowing one to readily 
determine specific uses of inventions based on this molecular sequence. 

FIGURE 3 provides genomic sequences that span the gene encoding the protease protein 
of the present invention. (SEQ ID NO:3) In addition structure and functional information, such 
as intron/exon structure, promoter location, etc., is provided where available, allowing one to 
readily determine specific uses of inventions based on ^s molecular sequence. As indicated in 
Figure 3, SNPs, including insertion/deletion polymorphisms C'indels'O, were identified at 69 
different nucleotide positions in and around the gene encoding the serine protease protein of the 
present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
General Description 

The present invention is based on the sequencing of the human genome. During the 
sequCTidng and assembly of the human genome, analysis of the sequence infomiation revealed 
previously unidoitified fisgments of the human genome that encode peptides that share 
structural and/or sequence homology to protein/peptide/doinains identified and characterized 
within the art as being a protease protein or part of a protease protein and are related to the serine 
protease sub&mily. Utilizing these sequences, additional genomic sequences were assembled 
and transczq)t and/or cDNA sequences were isolated and characterized. Based on this analysis, 
ttie present invention provides amino acid sequences of human protease peptides and proteins 
that are related to the serine protease sub&mily, nucleic acid sequences in the form of transcript 
sequences, cDNA sequences and/or genomic sequences that encode these protease peptides and 
proteins, nucleic acid variation (allelic infomiation), tissue distribution of esqxression, and 
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infonnadon about the closest art known protein^eptide/domain that has structural or sequence 
homology to the protease of the present invention. 

In addition to being previously unknown, the peptides that are provided in the present 
invention are selected based on their ability to be used for the development of commercially 
5 important products and services. Specifically^ the present pq)tides are selected based on 
homology and/or structural relatedness to known protease proteins of the serine protease 
subfamily and the expression pattern observed. Experimental data as provided in Figure 1 
indicates expression in humans in testis, placenta, fetal limg, fetal kidney, fetal heart, fetal brain, 
bone marrow, and in cancers. The art has clearly established the commercial importance of 
10 members of tiiis family of proteins and proteins that have expression patterns similar to that of 
the present gene. Some of the more specific features of the peptides of the present invention, and 
the uses thereof are described herein, particularly in the Background of the Invention and in the 
annotation provided in the Figures, and/or are known within the art for each of the known serine 
^mily or sub&mily of protease proteins. 

15 

Specific Embodiments 
Peptide Molecules 

The present invention provides nucleic add sequences that encode protein molecules that 
have been idmtified as being members of the protease family of proteins and are related to the 

20 serine protease sub&mily (protein sequences are provided in Figure 2, transcript/cDNA 

sequences are provided in Figure 1 and genomic sequences are provided in Figure 3). The 
peptide sequences xirovided in Figure 2, as well as the obvious variants described herein, 
particularly allelic variants as identified hearein and using the infonnation in Figure 3, will be 
referred herein as the protease p^tides of tiie present invention, protease peptides, or 

25 peptides/proteins of the present invention. 

The present invention provides isolated p^tide and protein molecules that consist o^ 
consist essentially of^ or comprise the amino acid sequences of the protease peptides disclosed in 
the Figure 2, (encoded by the nucleic acid molecule shown in Figure 1, transcript/cDNA or 
Figure 3, genomic sequence), as well as all obvious variants of these peptides that are within the 

30 art to make and use. Some of tiiese variants are described in detail below. 

As used herein, a pq)tide is said to be "isolated" or '^purified" when it is substantially firee 
of cellular material or fi?ee of chemical precursors or other chemicals. The peptides of the present 

12 
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invention can be purified to homograeity or o&er degrees of ptirity. The level of purification will 
be based on the intended use. The critical feature is that the p rq) ar ation allows for the desired 
fimction of the pq)tide, even if in the presence of considerable amounts of other components (the 
features of an isolated nucleic acid molecule is discussed below). 

In some uses, "substantiaUy fiee of cellular material** includes preparations of the peptide 
having less than about 30% (by dry weight) other proteins (i.e., contaminating protein), less than 
about 20% other proteins, less than about 10% other proteins, or less than aibout 5% other proteins. 
When the pqptide is recombinantly produced, it can also be substantially fi:Be of culture medium, 
i.e., culture medium represents less tiian aboirt 20% of the volume of the protein pr^aratioiL 

The language ''substantially fiiee of chemical precursors or other chemicals'* includes 
preparations of the peptide in which it is s^arated fiom chemical precursors or other chemicals that 
are involved in its syodiesis. Tn cmet emKndimmt^ th^ inngiiagpt "g^^^igtanti^^Hy frpft of chemical 
precursors or oth^ chenncals" includes preparations of the protease peptide having less than about 
30% (by dry wdg^) chemical precursors or o&er chemicals, less than about 20% chemical 
precursors or other chemicals, less than about 10% chemical precursors or o&er chemicals, or less 
than about 5% chanical precursors or other chemicals. 

The isolated protease p^tide can be purified fiom cells that naturally express it, purified 
from cells that have been altered to express it (recombinant), or synthesized using known protein 
synthesis mediods. Experimental data as provided in Figure 1 indicates expression in humans in 
testis, pl a centa, &tal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. For 
exan^Ie, a micleic acid molecule encoding the protease p^tide is cloned into an esqjression vector, 
the esqiression vector ixitroduced into a host ceU and the protein expressed in the host ce^ The 
protein can then be isolated fiom die cells by an appioprl ate purification scheme using standard 
protein purification techniques. Many of these techniques are described in detail below. 

* 

Accordingly, the presoit invention provides proteins that consist of the amino acid 
sequences provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the 
transci:q}t/cDNA nucleic acid sequences shown in Figure 1 (SEQ ID NO: 1) and the genomic 
sequences provided in Figure 3 (SEQ ID NO:3). The amino acid sequence of such a protein is 
provided in Figure 2. A protein consists of an anoino acid sequence when the amino acid sequence 
is the final amino add sequ^ice of the protein. 

The present invenytion finlher provides proteins that consist essentially of the amino acid 
sequences provided in Figure 2 (SEQ ID NO:2), bi example, proteins encoded by the 
transcript/cDNA nucleic acid sequences shown in Figure 1 (SEQ ID NO: 1) and the genomic 

13 
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sequences provided in Figure 3 (SEQ ID NO:3). A protein consists essentially of an amino acid 
sequence when such an amino add sequence is present with only a few additional amino acid 
residues, for GKwaple 6am about 1 to about 100 or so additional residues, typically fit>m 1 to about 
20 additional residues in the fbtial protein. 

The present invention further provides proteins that conqirise the amino acid sequences 
provided in Figure 2 (SEQ ID NO:2), for example, proteins encoded by the transcript/cDNA nucleic 
acid sequences shown in Figure 1 (SEQ ID NO: 1) and the genomic sequences provided in Figure 3 
(SEQ ID NO:3). A protein conqirises an amino add sequence when the amino add sequence is at 
least part of the final amino add sequence of ttie protein. In such a ^shion, the protein can be only 
the peptide or have additional amino add molecules, sudi as amino add residues (contiguous 
encoded sequence) that are naturally assodated with it or heterologous amino acid residues^eptide 
sequences. Such a protein can have a few additional amino acid leddiies or can con^risese^ 
himdrcd or more additional amino adds. The preferred classes of proteins that are comprised of the 
protease p^tidesof&e present invention are the naturaUy occurring mature proteins. A brief 
desoiptian of how various types of these proteins can be made/^isolated is provided bdow. 

The protease peptides of the present invention can be attached to heterologous sequences to 
form chimeric or fusion proteins. Such chimeric and fusion proteins comprise a protease peptide 
operatively linked to a heterologous protein having an amino add sequence not substantially 
homologous to the protease peptide. "Operatively linked" indicates that fixe protease peptide and the 
heterologous protein are fused xn-fiame. The heterologous protein can be fused to the N-terminus 
or C-texmimis of the protease peptide. 

In some uses, the fusion protein does not affect the activity of the protease peptideperse. 
For exdxiiple, the fusion protein can include, but is not limited to, enzymatic fusion proteins, for 
example beta-galactosidase fusions, yeast two-hybrid GAL fusions, poly-His fusions, MYC-tagged, 
Hl-tagged and Ig fusions. Such fusion proteins, particularly poly-His fusions, can Militate the 
purification of recombinant protease pqitide. In certain host cells (e.g., mammalian host cells), 
expression and/or secretion of a protein can be increased by using a heterologous si gnal sequence. 

A chimeric or fusion protein can be produced by standard recombinant DNA techniques. 
For example, DNA fiiagments coding for the different protein sequences are Ugated together in- 
fiame in accordance with conventional tedmiques. In another embodiment, the fusion gene can be 
synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PGR 
aziq)lificatiQn of gene fiiagments can be carried out using anchor primers which give rise to 
complementary overhangs between two consecutive gene firagments which can subsequently be 
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annealed and le-amplijSed to generate a chimeric gene sequence (see Ausubel et al.^ Current 
Protocols in Molecular Biology^ 1992). Moreover, many expression vectors are commercially 
available that already encode a fusion moiety (e.g.y a GST protein). A protease peptide-encoding 
nucleic add can be cloned into such an expression vector such that &e fusion moiety is linked in- 
5 fiame to the protease pq}tide. 

As mentioned above, &e present invention also provides and enables obvious variants of the 
amino acid sequence of the proteins of the present invention, such as naturally occurring mature 
forms of the peptide, allelic/sequence variants of the pq>tides, non-naturaUy occumng 
recombinantly derived variants of Ihe peptides, andor&ologs andpandogs of thepqptides. Such 

1 0 variants can readily be generated using art-known techniques in the fields of recombinant nucleic 
acid technology and protein biochemistry. It is understood, however, that variants exclude any 
amino acid sequences disclosed prior to the invention. 

Such variants can readily be identified/made using molecular techniques and the sequence 
information disclosed herein. Further, such variants can readily be distinguished from other 

1 5 p^tides based on sequmce and/or structural homology to the protease p^tides of the presmt 

invention. The degree of homology^dentity present will be based primarily on whether the peptide 
is a functional variant or nonrfimctiLotial variant, die amount of divergence present in tiie paralo g 
&nily and the evolutionary distance between the or^logs. 

To determine the pexcent identity of two amino acid sequences or two nucleic acid 

20 sequences, the sequences are aligned far optimal comparison purposes (e.g., gaps can be 

introduced in one or bodi of a first and a second amino add or nucleic acid sequence for optimal 
alignment and non-homologous sequences can be disregarded for comparison purposes). In a 
preferred embodiment, at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% or more of the length 
of a refisrence sequence is aligned for comparison purposes. The amino acid residues or 

25 nucleotides at corresponding amino acid positions or nucleotide positions are then compared. 
When a position in ttie first sequence is occupied by the same amino acid residue or nucleotide 
as the corresponding position in the second sequence, then the molecules are identical at that 
position (as used hmin amino acid or nucleic add "identity'* is equivalent to amino acid or 
nucleic add 'liomology"). The percent identity between the two sequences is a function of the 

30 number of identical positions shared by the sequences, taking into account the number of g^s, 
and the length of each gap, which need to be introduced for optimal aUgnment of the two 
sequCTces. 



15 



wo 02/26947 



PCT/USOl/29960 



The compaiison of sequences and detenninalion of percent identity and similarity 
between two sequences can be accomplished using a mathematical algorithm. {Computational 
Molecular Biology^ Lesk, A-M., ed., Oxford University Press, New York, 1988; Biocomputing: 
Informatics and Genome Projects^ Smith, D.W., ed.. Academic Press, New York, 1993; Computer 
5 Analysis of Sequence Data, Part 7, GriflBn, A^, and GrifBn, RG., eds,, Humana Press, New 
Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and 
Sequence Analysis Primer, Gribskov, M. and Devereox, eds., M Stockton Press, New York, 
1991). In a preferred embodiment, the percent identity between two amino acid sequences is 
determined using the Needleman and Wunsch (J. Mol Biol (48):444-453 (1970)) algorithm 

10 which has been incorporated into the GAP program in the GCG software package (available at 
ht^://www.gcg.com), using either a Blossom 62 matrix or a PAM250 matrix, and a g^ weight 
of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet another preferred 
embodiment, the percent identity between two nucleotide sequences is determined using the 
GAP program in the GCG software package (Oeveranc, J,, et al. Nucleic Acids Res. 22(1):3S7 

15 (1984)) (available at http://www.gcg.com), using a NWSgapdruLCMP matrix and a g^ weight of 
40, 50, 60, 70, or 80 and a lengdi weight of 1, 2, 3, 4, 5, or 6. In anotiier embodiment, the 
percent identity between two amino acid or nucleotide sequences is determined usibg the 
algoritiun of E. Myers and W. Miller (CABIOS, 4:11-17 (1989)) which has been incorporated 
into tiie ALIGK program (version 2.0), using a PAM120 weigtit residue table, a gap length 

20 penalty of 12 and a g^ penalty of 4. 

The nucleic add and protein sequences of the present invention can further be used as a 
"query sequence** to perfiirm a search against sequence databases to, for example, identify other 
&mily members or related sequences. Such searches can be perfomied using the NBLAST and 
XBLAST programs (version 2.0) of Altschul, et al. (J. MoL Biol 215:403-10 (1990)). BLAST 

25 nucleotide searches can be performed with the NBLAST program, score ~ 100, wordlength = 12 
to obtain nucleotide sequences homologous to the nucleic acid molecules of the inventioiL 
BLAST protein searches can be performed with the XBLAST program, score = 50, wordlength = 
3 to obtain amino acid sequences homologous to the proteins of the invention. To obtain gsqiped 
ahgnments for con^arison purposes. Gapped BLAST can be utilized as described in Altschul et 

30 al {Nucleic Adds Res. 25(17):3389-3402 (1997)). When utilimg BLAST and g^ped BLAST 
programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can 
be used. 
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Full-length pre-processed foxms, as well as mature prcx:ess6d fonn5> of proteins that 
comprise one of the ]>q>tides of tiie present invention can readily be identified as having complete 
sequence identity to one of &e protease peptides of the present invention as well as being encoded 
by the same genetic locus as the protease peptide provided herein. The gene provided by the present 
5 invention is located on a genome corxq)onent tiiat has been m^ped to human chromosome 4 (as 
indicated in Figure 3), which is siqyported by multQ)le lines of evidence, such as STS and BAC map 
<jata. 

Allelic variants of a protease p^tide can readily be identified as being a human protein 
having a hig^ degree (significant) of sequence homology^dentity to at least a portion of the protease 

10 peptide as well as being encoded by the same genetic locus as the protease peptide provided herein. 
Genetic locus can readily be detexmined based on tiie genomic information provided in Figure 3, 
such as the genomic sequence mapped to tiie reference human. Ihe gene provi ded by the present 
invention is located on a genome conqK>nent that has been m^>ped to human chromosome 4 (as 
indicated in Figure 3), which is supported by multq>le lines of evidCTce» such as STS and BAC map 

1 5 . data. As used herein, two proteins (or a region of the proteins) have significant homology when 
the amino add sequences are typically at least about 70-80%, 80-90%, and more typically at 
least about 90-95% or more homologous. A significantly homologous arruno acid sequence, 
according to the present invention, will be encoded by a nucleic add sequence that will hybridize 
to a protease peptide encoding nucldc add molecule under stringent conditions as more fiilly 

20 described below. 

Figure 3 provides information on SNPs that have been identified in the gene encoding the 
protease protein of tiie present invention. SNPs, including indels (indicated by a were 
identified at 69 dififerent nucleotide positians. Noursynonymous cSNPs were identified at position. 
30496. The changes in the ainiiK) add sequence caused by these SNPs is indicated 

2 5 can readily be detennined using the universal g^etic code and the protein sequence provided in 
Figure 2 as a reference. SNPs outside the ORF and in introns may affect control/regulatory 
elements. 

Paralogs of a protease peptide can readily be identified as having some degree of significant 
sequence homologyAdentity to at least a portion of the protease pq)tide, as being encoded by a gene 
30 fiiom humans, and as having similar activity or fimction. Twoproteins will typically be considered 
paralogs when the amino add sequences are typically at least about 60% or greater, and more 
typically at least about 70% or greater homology through a given region or domain. Such 
paralogs will be encoded by a nucldc add sequence tiiat will hybridize to a protease peptide 
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encoding nucleic acid molecule under moderate to stringent conditions as more fully described 
below. 

Qrthologs of a protease pq>tide can readily be identified as having some degree of 
significant sequmce homology/identity to at least a portion of the protease pqjtide as well as being 
encoded by a gene fiom another organism. Preferred ortfaologs will be isolated fiom mammals, 
preferably primates, for the developmeat of human ther^eutic targets and agents. Such orthologs 
will be encoded by a nucleic acid sequence that will hybridize to a protease peptide encoding 
nucleic acid molecule under moderate to stringent conditions, as more fiiUy described below, 
depending on the degree of relatedness of the two organisms yielding the proteins. The gene 
provided by the present invention is located on a genome component that has been ms^ped to 
human chromosome 4 (as indicated in Figure 3), which is siq)ported by multiple lines of 
evidence, such as STS and BAG map data. 

Figure 3 provides information on SNPs that have been identified in the gene encoding the 
protease protein of the present inventiozL SNPs, including indels (indicated by a **-"), were 
identified at 69 different nucleotide positions. Non-synonymous cSNPs were identified at position 
30496. Hie changes in tfie amino acid sequence caused by these SNPs is indicated in Figure 3 and 
can readily be determined using the universal genetic code and the protein sequence provided in 
Figure 2 as a reference. SNPs outside tiie ORF and in introns may afTect controlAegulatory 
elements. 

Non-natuially occurring variants of the protease p^tides of the presort invention can 
readily be genmted using recombinant techniques. Such variants include, but are not hmited to 
deletions, additionis and substitutions in the amino acid sequence of the protease peptide. For 
e>canq>le, one class of substitutions are conserved amino add substitution. Such substitutions are 
tiiose that substitute a given amino acid in a protease peptide by another amino add of like 
charactmstics. Typically sem as conservative substitutions are the replacements, one for another, 
among the a%hatic amino adds Ala, Val, Leu, and He; interchange of ttie hydroxyl residues Ser 
and Thr; exchange of the addic residues Asp and Glu; substitution betwem the amide residues Asn 
and Gin; exchange of the basic residues Lys and Arg; and r^lacements among the aromatic 
residues Phe and Tyr. Guidaxu^e concerning wMc^ariiino acid changes are likely to be 
phenotypically silent are found in Bowie et al.. Science 2¥7:1306-1310 (1990). 

Variant protease pqitides can be fiiDy fimctional or can lack fimction in one or more 
activities, e.g. ability to bind substrate, ability to cleave substrate, ability to paiticqiate in a signaling 
pathway, etc. Fully functional variants typically contain only conservative variation or variation in 
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non-critical residues or in non-critical regions. Figure 2 provides die result of protein analysis and 
can be used to idexitify critical domains/regions. Functional variants can also contain substitution of 
similar amino acids that result in no change or an insignificant change in fimctioa Alternatively, 
such substitutions may positively or negatively afTect function to some degree. 

Non-functional variants typically contain one or more non-conservative amino acid 
substitutions, deletions, insertions, inversions, or tiuncation or a substitution, insertion, inversion, or 
deletion in a critical residue or critical region. 

Amino acids that are essential for function can be identified by methods known in the art, 
such as site-directed mutagmesis or alanine-scanning mutagenesis (Cunningham e/ oLy Science 
244i\Q%l'\0%5 (1989)), particularly using the results provided in Figure 2. The latter procedure 
introduces stsigjealarunemutatioris at every residue in the molecule. The resulting mutant 
molecules are then tested for biological activity such as protease activity or in assays sudi as an in 
vz^ proliferative activity. Sites that are critical for binding partner/substrate binding can also be 
detemiined by structural analysis such as crystallization, nuclear magnetic resonance or 
photoafiBni^ labeling (Smith etaL,X MoL Biol 224:899-904 (1992); de Vos et oL Science 
255:306-312 (1992)). 

The present invention fiirther provides fragments of the protease peptides, in addition to 
proteins and peptides that corxtprise and consist of such fiagments, particularly those conqirising the 
residues identified in Figure 2. The fi:agments to which the invention pertains, however, are not to 
be construed as encompassing fiagments that may be disclosed publicly prior to the present 
invention. 

As used hereiii, a fiiagment comprises at least 8, 10, 12, 14, 16, or more contiguous amino 
add residues fixxm a protease peptide. Such fragments can be chosen based on the abili^ to retain 
one or more of the biological activities of the protease peptide or could be chosen for the ability to 
poform a fimction, e.g. bind a substrate or act as an immimogen. Particularly important fragments 
are biologically active fragments, peptides that are, for exazrq)le, about 8 or more amino acids in 
length. Suchfiagments wiUtypicaUy coriiprise a doiziain or motif of the protease pqitide, e.g., 
active site, a transmembrane domain or a substrate-binding domain. Further, possible fiiagments 
include, but are not limited to, domain or motif containing fiagments, soluble peptide fragments, 
and firag^ieaats containing immunogenic structures. Predicted HninaiTift and functional sites are 
readily identifiable by coniputer programs well known and readily available to those of skill in flie 
art (e,g.»PROSrrB analysis). The results ofone such analysis are provided in Figure 2. 
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Polypeptides often cantain amino acids other than Ifae 20 amino acids commonly referred to 
as the 20 naturally occuning amino adds. Further, many amino acids, including the teiminal amino 
acids, may be modified by natural processes, such as processing and other post-translational 
modifications, or by chemical modification techniques well known in the art Common 
modifications that occur naturally in protease peptides are desoibed in basic texts, detailed 
monogr^hs, and the research literature, and they are well known to those of skill in the art (some of 
these features are idmtified in Figure 2). 

Known modifications include, but are not limited to, acetylation, acylation, ADP- 
ribos>dation, amidation, covaloit attachment of flavin, covalent attachment of a heme moiety, 
covalmt attac^unent of a nucleotide or nucleotide daivative, covalent attachment of a lipid or lipid 
derivative, covalent attarhmmt of phosphotidylinositol, cross-linking, cyclization, disulfide bond 
formation, demetii;^ation, formation of covalent crosslinks, fomiation of cystine, formation of 
pyroghitamate, formication, ganmia cazbox^ation, glycosylation, GPI anchor formation, 
hydrox>dation, iodination, metfaidation, myristoylation, oxidation, proteolytic processing, 
phosphorylation, proi^atioii, racemization, seleno^ation, sul&tion, transfer-RNA mediated 
addition of amino acids to proteins such as aigin^dation, and ubiquitination. 

Such modlGcations are well known to those of skill in the art and have been described in 
great detail in the scientific literature. Several particularly common modifications, glycosyiation, 
lipid attachment, sul&tion, gamma-carfaoxjiation of glutamic add residues, hydrt)X>1ation and 
ADP-ribosyiation, for instance, are described in most basic texts, such as Proteins - Structure and 
Molecular Properties^ 2nd Ed., T^. Creighton, W. H. Freeman and Con9>any, New York (1993). 
Many detailed reviews are available on this subject, such as by Wold, F., Posttranslational Covalent 
Modification of Proteins^ B.C. Johnson, Ei, Academic Press, New York 1-12 (1983); Seifter et aL 
(Meth. Enzymol 182: 626-646 (1990)) and Rattan cfai {Awl N,Y, Acad ScL 663:48-62 (1992)). 

Accordingly, the protease pqstides of the present invention also encompass derivatives or 
analogs in which a substituted amino acid residue is not one encoded by the genetic code, in which 
a substituent group is included, in which the mature protease pepUde is fiised with another 
conqMund, soct as a conqx>mid to increase tiie half-life of the protease pq>tide (for exanqyle, 
polyethylene glycol^ or in which Hxe additional amino adds are fiised to tiie mature protease 
peptide, such as a leader or secaetory sequmce or a sequence for purification of the mature protease 
peptide or a pro^rotein sequence. 
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ProteiD/Pqitide Uses 

The proteins of &e present invention can be used in substantial and specific assays 
related to the functional information provided in the Figures; to raise antibodies or to elicit 
another inunune response; as a reagent (including the labeled reagent) in assays designed to 
quantitatively determine levels of ttie protein (or its binding partner or ligand) in biological 
fluids; and as markers for tissues in which the corresponding protein is preferentially expressed 
(either constitutrvely or at a particular stage of tissue differentiation or development or in a 
disease state). Where the protein binds or potentially binds to another protein or ligand (such as, 
for exanxple, in a protease-effector protein interaction or protease-ligand interaction), the protein 
can be used to identify the binding partner/ligand so as to develop a system to identify inhibitors 
of tibe binding interaction. Any or all of these uses are capable of being developed into reagent 
grade of kit format for co mmerci alization as co mm ercial products. 

Methods for performing the uses hsted above are well known to those skilled in the art 
References disclosing such methods include "Molecular Cloning: A Laboratory Manual", 2d ed.. 
Cold Spring Haifoor Laboratory Press, Sambrook, J., E. F. Fritsch and T. Maniatis eds., 1989, 
and '^e&ods in Enzymology: Guide to Molecular Cloning Techniques", Academic Press, 
Berger, S. L. and A. R- Kimmel eds., 1987. 

The potential uses of the peptides of tiie present invention are based primarily on the 
source of the protein as well as the class/action of the protein. For example, proteases isolated 
from humans and their human/mammalian orthologs serve as targets for identifying agents for 
use in mammalian tiier^eutic {plications, e.g. a human drug, particularly in modulating a 
biological or paOiological response in a cell or tissue that expresses &e protease. Experimental 
data as provided in Figure 1 indicates ttiat protease proteins of flie present invention are 
esqiressed in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. Specifically, a virtual northern blot shows expression in cancers. In 
addition, PCR-based tissue screening panels indicate expression in testis, placenta, fetal lung, 
fetal kidney, fetal heart, fetal brain, and bone marrow. A large percentage of pharmaceutical 
agents are being developed that modulate the activity of protease proteins, particularly members 
of the serine sub&znily (see Background of the hivention). The structural and fimctional 
information provided in the Background and Figures provide specific and substantial uses for the 
molecules of the present invention, particularly in combination with the expression information 
provided in Figure L Experimental data as provided in Figure 1 indicates expression in humans 
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in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. 
Such uses can readily be detennined using the information provided herein, that which is kxiown 
in the art, and routine experimeatation. 

The proteins of the present invention (including variants and fragments that niay have been 
5 disclosed prior to the presmt invention) are useful for biological assays related to proteases that are 
related to members of the serine sub&mily. Such assays involve any of the known protease 
functions or activities or properties useful for diagnosis and treatment of protease-related conditions 
that are specific for the sub&mily of proteases tiiat tiie one of the present invention belongs to, 
particularly in cells and tissues that express the protease. Experimental data as provided in Figure 1 

10 indicates that protease proteins of the present invention are esqpressed in humans in testis, placenta, 
fetal lung, fetal kidney, fotal heart, fetal brairi, bone rnarrow, arid in cancers. Specifically, a virtual 
northern blot shows expression in cancers, hi addition, PCR-based tissue screening panels indicate 
expression in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, and bone marrow. 

The proteins of the present invention are also usefiil in drug screening assays, in cell-based 

15 or cell-fiee systems. CeU4>ased systeins can be riative,i.e., cells that nonnally express the protease, 
as a biopsy or expanded in cell culture. Exprnmental data as provided in Figure 1 indicates 
expression in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. In an alternate embodiment, cell-based assays involve recombinant host 
cells expressing the protease protem. 

20 The polypq)tides can be used to identify compounds that modulate protease activity of the 

protdnin its natural state or an altered form that causes a specific disease or pathology associated 
with the protease. Both the proteases of the present invention and appropri ate variants and 
fiagments can be used in hig^througtq>ut screens to assay candidate conqx>unds for the abihty to 
bind to the protease. These conipourxls can be finiher screened agairist a fimctiorial protease to 

25 detennine&eefiEectof1h6CorrqK>und on &e protease activity. Further, diese compounds can be 
tested in aniriial or invertebrate systems to deterrriineactivity/efir^^ Conqx)undscanbe 
identified that activate (agonist) or inactivate (antagonist) the protease to a desired degree. 

Further, fiie proteins of the present invention can be used to scarai a con:qx>und for ihe 
ability to stimulate or inhibit interaction between the protease protein and a molecule that normally 

30 interacts wiOi &e protease jxrotein, e.g. a substrate or a component of the signal pathway that the 
protease protdn normally interacts (for example, a protease). Such assays typicaUy include the 
steps of combining the protease protein witti a candidate compound under conditions that allow the 
protease protein, or fi:agment, to interact with the target molecule;, and to detect the formation of a 
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complex between the protem and the target or to detect the biochemical consequence of ttie 
interaction with the protease protem arid the target, such as any of the associated effects of signal 
transduction such as protein cleavage, cAMP turnover, and adeD>date cyclase activation, etc. 

Candidate compounds include, for example, 1) peptides such as soluble pq>tides, including 
5 Ig-tailed &sion peptides and members of random peptide libraries (see, e.g.. Lam et aL^ Nature 
354:%2-M (1991); Hou^ten et al.. Nature 554:84-86 (1991)) and combinatorial chemistry-<ierived 
molecular libraries made of D- and/or Lr- configuration amino adds; 2) phosphopqitides (e.g., 
members of random and partially degenerate, directed phosphopq>tide libraries, see, e.g., Songyang 
et al. Cell 72:767-778 (1993)); 3) antibodies (e.g., polyclonal, monoclonal, humanized, anti- 

10 idiotypic, chimeric, and single chain antibodies as well as Fab, F(ab Fab expression library 
fiagments, and epitope4)inding fragments of antibodies); and 4) small organic and inorganic 
molecules (e.g^ molecules obtained from combinatorial and natural product libraries). 

One cm^\^^^ confound is a soluble fiagmeait of the zeceptoT fliat con:Q>etes for substrate 
binding. Other candidate compounds include mutant proteases or approp riate fragments containing 

15 mutations d3at affect piotease function and thus conigjete for substrate. Accordingly, a fragment tiiat 
competes for substrate, for exan^le with a iu^er afBnity, or a fragment that binds substrate but 
does not allow release, is encompassed by the invention. 

The invention further includes other end point assays to identify compoimds that modulate 
(stimulate or inhibit) protease activity. Theassaystypicallyinvolveanassay of events in flie signal 

20 transduction pa&way that indicate protease activity. Thus, the cleavage of a substrate, 

inactivation/activatian of a protein, a change in the expression of genes that are up- or down- 
regulated in response to the protease protein depmdent signal cascade can be assayed 

Any of the biolo^cal or biochemical functions mediated by the protease can be used as an 
enc^int assay. These include all of &e biochemical or biodienucal/biological events described 

25 herein, in tiie references cited herein, incorporated by reference foft these enc^int assay targets, and 
other functions known to diose of ordinary skill in file art or that can be readily identified using the 
information provided in ttie Figures, particularly Figure 2. Specifically, a biological function of a 
cell or tissues that expresses the protease can be assayed. Experimental data as provided in Figure 1 
indicates that protease proteins of the present invention are expressed in humans in testis, placenta, 

30 fetal lung, fetal kidney, fetal heart, fetal brain, bonemazxow, and in cancers. Specifically, a virtual 
northern blot shows expression in cancers, hi addition, PCR-based tissue screening panels indicate 
expression in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, and b one marrow. 
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Binding and/or activating con^unds can also be screened by using chimeric protease 
proteins in which the amino tenninal extracellular domain, or parts thereof the entire 
, transmembrane domain or subregions, such as any of the seven transmembrane segments or any of 
the intracellular rextraceUular loops and the carboxy tenninal intracelhilar domain, or 
5 thereoJ^ can be rq)laced by hetat>logqus domains or subregions. For example, a substrate-binding 
region can be used that interacts with a different substrate tiien that \^ch is reco goized by the 
native protease. Accordingly, a different set of signal transduction coniponents is available as an 
end-point assay for activation. This allows for assays to be performed in o&er dian the specific host 
cell from which &e protease is derived. 

10 The proteins of the present invention are also useful in competition binding assays in 

methods designed to discover compounds that interact with the protease (e.g. binding partners 
and/or ligands). Thus, a compoundis exposed to a protease polypq>tide under conditions &at aUow 
the compound to bind or to otherwise interact with the polypeptide. Soluble protease polypeptide is 
also added to the mixture. H'the test conq>ound interacts with &e soluble protease polypeptide, it 

15 decreases the amount ofoono^lex formed or activity fit)m the protease taiget This type of assay is 
particularly useful in cases in ^m^ch con:q>ounds are sou^t that interact with specific regions of die 
protease. Thus, the soluble polypeptide that conq)etes with the target protease region is designed to 
contain p^tide sequences couesponding to the region of interest 

To -pGr^atm cell free drug screening assays, it is sometimes desirable to immobilize either 

20 the protease protein, or fragment, or its target molecule to fecilitate separation of complexes from 
uncomplexed frmns of one or bo& of the proteins, as well as to accommodate automation of the 
assay. 

Techniques frir immobilizing proteins on n[iatrices canbe used in the drug screening assays. 
In one embodiment, a fiasion protein can be provided which adds a domain that allows the protein to 

25 be bound to a matrix. For exanq}le,glutathione-S-^transferase fusion proteins can be adsoi^^ 

glutathione sepharose beads (Sigma Chemical, St Louis, MO) or glutathione derivatized microtitre 
plates, whidi are then combined with the cell lysates (e.g., ^S-labded) and the candidate 
con^pound, and flie mixture incubated tmder conditions conducive to conq>lex formation (e.g., at 
physiological conditions fi>r salt and pll). Following incubation, the beads are washed to remove 

3 0 any unbound label, and the matrix immobiUzed and radi olabel detennined directly, or in the 

sq)ematant after ttie con^lexes are dissociated. Alternatively, the complexes can be dissociated 
from the matrix, separated by SDS-P AGE, and the level of protease-binding protein found in the 
bead fraction quantitated from the gel using standard dectrophoretic techniques. For exan^le, 
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ei&er die polypeptide or its target molecule can be immobilized utilizixig conjugation of biotin and 
strq>tavidin using techniques well known in the art Alternatively, antibodies reactive with the 
protein but which do not interfere with binding of the protein to its target molecule can be 
derivatized to tixe wells of the plate, and the protein tr^ed in the wells by antibody conjugatioiL 
5 Preparations of a protease-binding protein and a candidate compound are incubated in the protease 
protein-presenting wells and the amount of complex trapped in the well can be quantitated. 
Methods for detecting sudi coxx^lexes, in addition to those described above for the GST- 
immobilized complexes, include immunodetection of com5>lexes using antibodies reactive witti the 
protease protein target molecule, or which are reactive witii protease protein and compete with the 

1 0 target molecule, as well as Enzyme-linked assays \ivUch rely on detecting an enzymatic activity 
associated with the target molecule. 

Agents aust modulate one of the proteases of the present invention can be identified using 
one or more of the above assays, alone or in combination. It is generally preferable to use a cell- 
based or cell &ee system first and then confimi activity in an animal or o&er model system. Such 

15 model systems are well known in &e art and can readily be enG^loyed in this coiitex^ 

Modulators of protease protein activity identified according to these drug screening assays 
can be used to treat a subject with a disorder mediated by the protease pathway, by treating cells or 
I tissues that e?qiress the protease. B7q>erimental data as provided in Figure 1 indicates expression in 
humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone manow, and in 

20 cancers. These methods of treatment include the steps of administering a modulator of protease 

activity in a pharmaceutical cot[qx>sition to a subject in need of such treatment, modulator being 
identified as described herein. 

Id yet another aspect of the invention, the protease proteins can be used as "bait proteins'* 
in a two-hybrid assay or three-hybrid assay (see, e.g., U-S. Patent No. 5^283,317; Zervos et al, 

25 (1993) Ceff 72:223-232; Madura a/. (1993)y. BioL Chem, 268:12046-12054; BartelcT a/. 
(1993) Biotechniques 14:920-924; Iwabuchi cf oL (1993) Oncogene 8:1693-1696; and Brent 
WO94/10300), to identify other proteins, which bind to or interact wiOi the protease and are 
involved in protease activity. Such protease-binding proteins are also likely to be involved in the 
propagation of signals by the protease proteins or protease targets as, for example, downstream 

30 elements of a proteas&-mediatedsignaIinjg pathway. Alternatively, such protease-binding 
proteins are likely to be protease inhibitors. 

The two-hybrid system is based on the modular nature of most transcription factors, 
which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 
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differeat DNA constructs. In one construct, the gene that codes fi3r a protease protein is fiised to 
a gene encoding the DNA binding domain of a known transcription fector (e.g., GAL-4). In the 
other construct, a DNA sequence, from a library of DNA sequences, that encodes an unidentified 
protein O'ptey'* or "sample") is fused to a gene that codes for tbe activation domain of the known 
5 transcription &ctor. If the "bait" and &e '^rey" proteins are able to interact, in vivo, fonning a 
protease-dependent complex, the DNA-binding and activation domains of the transcription factor 
are brought into close proximity. This proximity allows transcription of a reporter gene (e.g., 
LacZ) which is operably linked to a tianscriptionai regulatory site responsive to the transcription 
^\ar. Expression of the rq>orter gene can be detected and cell colonies containing the 

10 functional transcription £u:tor can be isolated and used to obtain the cloned gene which encodes 
the protein which interacts with the protease protein. 

This invention further pertains to novel agents identified by the above-described 
screening assays. Accordingly, it is within the scope of this invention to further use an agent 
identified as described herein in an sqipropriate animal modeL For exanq)le, an agent identified 

15 as described herein (e.g., a protease-modulating agent, an antisense protease nucleic acid 

molecule, a protease-specific antibody, or a protease-binding partner) can be used in an animal 
or other model to determine the efficacy, toxicity, or side efEiscts of treatment with such an agent 
Alternatively, an agent identified as described herein can be used in an animal or other model to 
determine the mechanism of action of such an agent. Furthermore, this invention pertains to uses 

20 of novel agents identified by the above-described screening assays for treatments as described 
herein. 

The protease proteins of the present invention are also useful to provide a target for 
diagnosing a disease or predisposition to disease mediated by the p^tide. Accordingly, the 
invention provides methods tor detecting the presence, or levels o^ the protein (or encoding 

25 mRNA)inacell,tissue,ororganisnL Experimental data as provided in Figure 1 indicates 
expression in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. The method involves conts^ting a biological san^le with acoxiqx>und 
capable ofinteractingwi&Ae protease protein sudi that the interaction cm Such an 

assay can be xncovided in a single detection format or a multi-detection format such as an antibody 

30 cbap anay. 

One agent for detecting a protein in a sanq)le is an antibody enable of selectively binding to 
protein. A biological san^ile includes tissues, ceUs and biological fluids isolated fiom a subject, as 
well as tissues, cells and fluids present within a subject 
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The peptides of the pxesesit inventioii also provide targets for diagnosing active protein 
activity, disease, or predispositian to disease, in a patient having a variant pq)tide, particularly 
activities and conditions that are known for other members of the &mily of proteins to which the 
present one belongs. Thus, the pq}tide can be isolated fiom a biological sample and assayed for the 
5 presence of a genetic mutation that results in aberrant peptide. Ibis includes amino acid 
substitution, del^on, insertion, rearrangement, (as the result of aberrant splicing events), and 
inE^ropriate post-translatLonal modificatiotL Analytic methods include altered electrophoretic 
mobility, altered tryptic peptide digest, altered protease activity in cell-based or cell-fiiee assay, 
alteration in substrate or antibody-binding pattern, altered isoelectric point, direct amino acid 
1 0 sequencing, and any other of the known assay techniques useful for detecting mutations in a protein. 
Such an assay can be provided in a single detection format or a multi-detection format such as an 
antibody cb^ array. 

In vitro techniques for detection of p^tide include enzyme linked immunosoibmt assays 
(EUS As), Western blots, fmmunoprecipitations and immunofluorescence using a detection reagent, 

15 such as an antibodyr or protein binding agent Altmiatively, the peptide can be detected vt vm? in a 
subject by introducing into the subject a labeled anti-peptide antibody or other types of detection 
agent For example^ &e antibody can be labeled with a radioactive marker v(dK>seprese^^ 
location in a subject can be detected by standard iinagiiig techniques. Particularly useful are 
methods that detect the allelic variant of a peptide expressed in a subject and mefliods \^ch detect 

20 firagmentsof apeptideina8anq)le. 

The peptides are also useful in pharmacogenomic analysis. Phannacogenomics deal with 
clinically significant hereditary variations in the response to drugs due to altered drug di^>osition 
and abnormal action in affected persons. See, e.g., Eichelbaum, M. (CIul Exp. PharmacoL PhysioL 
23(10-1 1):983-985 (1996)), and linder, M.W. (Owi. Otem. 43(2):254-266 (1997)). The cHnical 

25 outcomes of these variations result in severe toxicity of therEq>eutic drugs in certain individuals or 
tho^eutic failure of drugs in certain individuals as a result of individual variation in metabolism. 
Thus, the genotype of the individual can determine the way a therapeutic compound acts on tte 
body or the way the body metabolizes the compound. Further, the activity of drug metabolizing 
enzymes effects both the intmsity and duration of drug action. Ilius, fbs phannacogenomics of the 

30 individual permit the selection of effective con^unds and effective dosages of such compounds for 
proph3dactic or tfaer^eutic treatment based on tiie individual's genotype. The discovery of genetic 
polymorphisms in some drug metabolizing enzymes has explained why some patients do not obtain 
the e9q)ected drug effects, show an exaggerated dmg effect, or experience serious toxicity &om 
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Standard drug dosages. Polymoiphisms can be expressed in the phenotype of extensive 
metabohzer and the phenotype of tiie poor metabolizer. Accordingly, genetic polymorphism may 
lead to allelic protein variants of the protease protein in vMch one or more of the protease functions 
in one population is different fiom those in another population. The peptides thus allow a target to 
5 ascertain a genetic predisposition that can afTect treatment modahty. Thus, in a Ugand-based 
treatment, polymorphism may give rise to anuno terminal extracellular domains and/or other 
substrate-binding regions that are more or less active in substrate binding, and protease activation. 
Accordingly, substrate dosage would necessarily be modified to maximize the ther^eutic effect 
within a given population containing a polymorphisoL As an alternative to genotyping, specific 

10 polymorphic peptides could be identified. 

The peptides are also usefiil for treating a disorder characterized by an abs^ce o^ 
in appr o pri ate, or unwanted expression of tiie protdiL Expmmental data as provided in Figure 1 
indicates expression in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, 
bone marrow, and in cancers. Accordingly, methods for treatment include the use of the protease 

15 protein or firagments. 



Antibodies 

Hie invention also provides antibodies that selectively bind to one of the peptides of the 
present invention, a protein conotprising such a peptide, as well as variants and fi:agments thcreo£ 

20 As used hearein, an antibody selectively binds a target peptide when it biads tiie target peptide and 
does not significantly bind to unrelated proteins. An antibody is stUl considered to selectively bind 
a pq)tide even if it also binds to other proteins tiiat are not substantially homologous with the target 
pq>tide so long as sudb proteins share homology with a fiagment or domain of tiie pq>tide target of 
theantibody. In tibis case, it would be uiiderstood that antibo<fy binding to the pq)tide is still 

25 selective despite some degree of cross-reactivity. 

As used herein, an antibody is defined in terms consistent with that recognized within the 
art: ttiey are multi-subunit proteins produced by a TnamTnalifln organism in re^x)nse to an antigen 
challenge. The antibodies of the present invmtion include polyclonal antibodies and monoclonal 
antibodies, as well as firagments of such antibodies, including, but not limited to. Fab or F(ab^, and 

30 Fv fragments. 

Many methods are known for generating and/or identifying antibodies to a given target 
peptide. Several such methods are described by Harlow, Antibodies, Cold Spring Harbor Press, 
(1989). 
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Jn general, to generate antibodies, an isolated peptide is used as an immunogen and is 
administered to amanxmalian organism, such as a zat, rabbit or mouse. The full-length protein, an 
antig^c peptide fragment or a fusion protein can be used. Particularly important fragments are 
those covering functional domains, such as the domains identified in Figure 2, and domain of 
sequence homology or divergence amongst the &mily, such as those that can readily be identified 
using protein alignment methods and as presented in the Figures. 

Antibodies are preferably prepared firom regions or discrete fi^gments of the protease 
proteins. Antibodies can be prepared from any region of the peptide as described herein. 
However, preferred regions will include those involved in function/activity and/or 
protease^inding partner interactioiL Figure 2 can be used to identify particularly important 
regions while sequence alignment can be used to identify conserved and unique sequence 
firagments. 

An antigenic fragment will typically comprise at least 8 contiguous amino acid residues. 
The antigenic peptide can con:q>rise, however, at least 10, 12, 14, 16 or more amino acid residues. 
Such firagments can be selected on a physical property, such as fragments correspond to regions that 
are located on Ihe sur&ce of the protein, e.g., hydrophilic regions or can be selected based on 
sequence uniqueness (see Figure 2). 

Detection on an antibody of tiie present invention can be fecilitated by coupling (i.e., 
physically linking) the antibody to a detectable substance. Exantples of detectable substances 
include various enzymes, prosthetic groins, fiuorescent materials, tominescent materials, 
bioluminescent materials, and radioactive materials. Exarnples of suitable enzymes include 
horseradish peraixidase, alkaline phosphatase, p-galactosidase, or acetylcholinesterase; examples of 
suitable pros&etic group conq>lexes include streptavidin/biotin and avidin/biotin; examples of 
suitable fiuorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, 
rhodamine, dichlorotriazinylamine fluorescein, dans^ chloride or phycoerytiirin; an example of a 
luminescCTt material includes hnninol; exarxq>les of bioluminescent materials include luciferase, 
lucifoin, and aequorin, and examples of suitable radioactive material include "^I, or 

Antibody Uses 

Ihe antibodies can be used to isolate one of the proteins of the present invention by standard 
teciniiqueSySuchasaflSnity^chrornatogr^hyoriimmmoprBcipitation. The antibodies can fecilitate 
the purification of tiie natural protein fitxm cells and recombinanfly produced protein expressed in 

host cells. In addition, such antibodies are useful to detect the presence of one of the proteins of the 
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present invention in cells or tissues to detemine tiie pattern of expression of the protein among 
various tissues in an organism and over the comse of normal devielopment Expetimental data as 
provided in Figure 1 indicates that protease proteins of the present invention are expressed in 
hinnans in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in 
5 cancers. SpedbBcally, a virtual nordiem blot shows expression in cancers, hi addition, PCR-based 
tissue screening panels indicate expression in testis, placmta, fetal lung, fetal kidney, fetal heart, 
fetal brain, and bone marrow. Fiirtiier, such antibodi^ can be iised to detect protein zn^zTu, in vi//^ 
or in a ceUlysate or siq>ematant in order to evaluate the abundance and pattern of e3q)rB^ Also, 
such antibodies can be used to assess abnormal tissue distribution or abnormal expression during 

10 development or progression of a biological conditioiL Antibody detection of circulating fragments 
ofthe full length protein can be used to identify turnover. 

Further, tiie antibodies can be used to assess expression in disease states such as in active 
stages of the disease or in an individual with a predisposition toward disease related to tiie protein's 
fimction. When a disorder is caused by an in^ipropriate tissue distribution, developmental 

15 expression, level of expression of the protein, or expressed/processed form, the antibody can be 

prepared agairisttiienorrnal protein. E3q)erimCTtal data as provided in Figure 1 indicates expression 
in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone marrow, and in 
cancers. If a disorder is characterized by a specific mutation in the protein, antibodies specific for 
this mutant protein can be used to assay for the presence of the specific mutant protein. 

20 The antibodies can also be used to assess normal and aberrant subcellular localization of 

cells in the various tissues in an organism. Experimental data as provided in Figure 1 indicates 
expression in humans in testis, placenta, fetal hmg, fetal kidney, fetal heart, fetal brain, bone 
marrow, and in cancers. The diagnostic uses can be ^>phed, not only in genetic testing, but also in 
monitoring a treatment modality. Acconliiigjy, whore treatment is ultiinately aimed at cortectiiig 

25 expression level or the presence of aberrant sequence and aberrant'tissue distribution or 

developmental G}qiression, antibodies directed against the protein or relevant fragm ents can be used 
to monitor tfaer^eutic efficacy. 

Additionally, antibodies are usefiil in pharmacogenomic analysis. Thus, antibodies prepared 
against polymorphic proteins can be used to identify individuals that requiro modified treatment 

30 modalities. The antibodies are also useful as diagnostic tools as an iinnlunological marker fi^ 

aberrant protein analyzed by eledrophoretic mobiUty, isoelectric point, tryptic pq>tide digest, and 
otiier physical assays known to those in fiie art 
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* 

The antibodies are also useful for tissue typing. ExperimeDtal data as piovided in Figure 1 
indicates expression in faumans in testis, placenta, fetal lung» fetal kidney, fetal heart, fetal brain, 
bone marrow, and in cancers. Thus, where a specific protein has been correlated with e?q>ression in 
a specific tissue, antibodies that are specific for this protein can be used to identify a tissue type. 
5 The antibodies are also useful for inhibiting protein fimction, for example, blocking the 

binding ofthe protease peptide to a binding partner such as a substrate. These uses can also be 
^yplied in a ther^>eutic context in ^Uch treatment involves inhibiting the protein's functiorL An 
antibody can be iised, far exanqjle, to block binding, ttius modulating (agonizing or antagonizing) 
the peptides activity. Antibodies can be prepared against specific fi:agmeQts containing sites 

1 0 required for function or against intact protein that is associated with a cell or cell membrane. See 
Figure 2 for structural information relating to the proteins of the present invmtion. 

The invention also enooiD3>asses kits for using antibodies to d^ect die presence of a protein 
in a biological sanq>le. The kit can conqnise antibodies such as a labeled or labelable antibody and 
a compound or agent for detecting protein in a biological sanq)le; means for determining die amount 

15 of protein in die sainple; means for comparing the amount of protein in Ihe sanq>le with a stand^d; 
and instructions for use. Such a kit can be siqyplied to detect a single protein or qpitope or can be 
coiifigored to detect one of a multitude of q>itopes, such as in an antibody detection arta Arrays 
are described in detail below for luicleic acid arrays and similar methods have been developed for 
antibody arrays. 

20 

Nucleic Acid Molecules 

The present mvention furtiier provides isolated nucleic acid molecules that encode a 
protease pqitide or protein of the present invention (cDNA, transcrq)t and genomic sequence) . 
Such nucleic acid molecules will consist of, consist essentially o^ or conqirise a nucleotide 

25 sequence that encodes one of the protease pq)tides of the present invention, an allelic variant 
thereoJ^ or an ortiiolog or paralog thereof. 

As used herein, an **isolated** nucleic add molecule is one that is separated fiom other 
nucleic acid preset in the natural source of tiie nucleic acid. Preferably, an "isolated" nucleic add 
is fiee of sequences which naturally flank the nucldc add Q.e., sequences located at the 5* and 3 * 

30 ends of die nucleic add) in &e genomic DNA of the organism fiom which the nucldc add is 

derived. However, there can be some flanking nucleotide sequences, for example i^ to about 5KB, 
4KB, 3KB, 2KB, or 1KB or less> particularly contiguous peptide encoding sequences and peptide 
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eDcoding sequences within the same gene but sq}arated by intions in the genomic sequence. The 
important point is that the nucleic add is isolated from remote and uninq>ortant flaiilring sequences 
such that it can be subjected to the specific manq)ulations described herein such as recombinant 
expression, preparation of probes and primers, and other uses specific to the nucleic acid sequences. 

Moreover, an "isolated" nucleic add molecule, sudi as a transcript/cDNA molecule, can be 
substantially firee of oflier ceUular material, or cuhuie medium when produced by recombinant 
techniques, or chemical precursors or oth^ chemicals when chemically synthesized. However, the 
nucleic add molecule can be fiised to other coding or regulatory sequences and still be considered 
isolated. 

For example, recombinant DNA molecules contained in a vector are considered isolated. 
Further examples of isolated DNA. molecules indude recombinant DNA molecules maintained in 
heterologous host cells or purified (partially or substantially) DNA molecules in solution. Isolated 
RNA molecules include in vivo or in vitro RNA transcripts of the isolated DNA molecules of the 
present inveotioa Isolated nucldc add molecules according to the present invention fiiither include 
such molecules produced synthetically. 

Accordingly, the present invention provides nucldc add molecules that consist of the 
imcleotide sequence shown in Figure 1 or 3 (SEQ ID NO:l, transcxQit sequence and SEQ ID NO:3, 
genomic sequence), or any nucldc add molecule lhat encodes the protein provided in Figure 2, 
SEQ ID NO:2. A nucldc add molecule consists of a nucleotide sequence when the nucleotide 
sequence is the con^lete nucleotide sequence of the nucldc add molecule. 

The present invention finther provides nucldc add molecules that consist essentially of the 
zuicleotide sequence shown m Figure 1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID NO:3, 
genomic sequence), or any nucldc add molecule that encodes the protein provided in Figure 2, 
SEQ ID NO:2. A micldc add molecule consists essentially of a nucleotide sequence when such a 
nucleotide sequence is present with only a few additional nucldc add residues in the final nuddc 
add molecule. 

The present invention fiirtfaer provides nucldc add molecules that conq>rise the nucleotide 
sequences shown in Figure 1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID NO:3, genomic 
sequence), or aiQrnucldc add molecule that encodes the protdn provided in Figure 2, SEQ ID 
NO:2. A nucldc add molecule comprises a nucleotide sequCTce when the micleotide sequenced 
least part of the final nucleotide sequence of fte nucldc add molecule. In such a &shion, the 
imcldc add molecule can be only the imcleotide sequence or have additional nucldc add residues, 
such as micldc add residues that are naturally associated with it or heterologous nucleotide 
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sequences. Such a nucleic acid molecule can have a few additional nucleotides or can comprises 
several hundred or more additional nucleotides. A brief descrq>tion of how various types of these 
nucleic acid molecules can be readily made/isolated is provided below. 

In Figures 1 and 3, both coding and non-coding sequences are provided. Because of the 
5 source of the present invention, humans genomic sequence (Figure 3) and cDNA/transcript 
sequences (Figure 1), the nucldc acid molecules in the Figures will contain genomic intronic 
sequences, 5' and 3' non-coding sequences, gene regulatory regions and non-coding intergenic 
- sequences. In general such sequence features are either noted in Figures 1 and 3 or can readily 
be identified using computational tools known in the art As discussed below, some of the non- 
10 coding regions, particularly gene regulatory elements such as promoters, are useful for a variety 
of purposes, e.g. control of heterologous gene e?q)ression, target for identiiying gene activity 
modulating conqK)unds, and are particulariy claimed as fiagments of the genomic sequence 
provided herein. 

The isolated nucleic add molecules can encode 1hc mature protein plus additional amino or 
15 caiboxyd-teoninal amino acids, or amino acids interior to tiie mature peptide (when tiie mature form 
has more tiian one p^tide chain, fSar instance). Such sequences may play a role in processing of a 

protein finm precursor to a mature form^ facilitate prntein t rafficlfing^ pmlnng or shorten protein 

half-life or &dlitate manQ>ulation of a protein fer assay or production, among other things. As 
gpneraSly is the case in siiu^ the additional amino adds may be processed away fiom the mature 

20 protein by cellular easzymes. 

As mentioned above, the isolated nucldc add molecules include, but are not limited to, the 
sequence encoding the protease peptide alone, tiie sequence encoding the mature peptide and 
additional coding sequences, sudi as a leader or secretory sequence (e.g., a pre-pro or pro-protdn 
sequence), the sequence encoding the mature pq>tidei, witii or without the additional coding 

25 sequences, plus additional non-coding sequences, for example intjons and non^coding 5* and 3' 

seqpieoces such as transcribed but non-translated sequences that play a role in transcription, mRNA 
processing (including splicing and polyadm>iation signals), ribosome binding and stability of 
inRNA. M addition, the nucldc add molecule xnay be fused to a xnaiker sequence encoding, for 
example, a peptide that fodlitates purification. 

30 Isolated nucldc add molecules can be in the fotm of RNA, such as mRNA, or in the form 

DNA, including cDNA and genomic DNA obtained by cloning or produced by chemical synthetic 
techniques or by a combination thereof Hie nucldc add, especially DNA, can be double-stranded 
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or smgle-stranded. Single-stranded nucleic acid can be the coding strand (sense strand) or the non- 
coding strand (anti-seose strand). 

The invention fiirther provides nucleic add molecules that encode fiagments of the peptides 
of the present invention as well as nucleic acid molecules that encode obvious variants of the 
protease proteins of the present invention that are described above. Such nucleic acid molecules 
may be naturally occurring, such as allelic variants (same locus), paralogs (different locus), and 
orthologs (differCTt organism), or may be constructed by recombinant DNA methods or by 
chemical synthesis. Suchnon-naturally occurring variants rnay be inade by mutagenesis 
techniques, including those ^Hed to nucleic add molecules, cells, or organisms. Accordingly, as 
discussed above, die variants can contain nucleotide substitutions, deletions, inversions and 
insertions. Variation can occur in either or both ttie coding and non-coding regions. The variations 
can produce both conservative and noiHconservative amino add substitutions. 

The preseat invention finlher provides non-coding fragments of the nucldc acid molecules 
provided in Figures 1 and 3. Preferednon-codiiig fiagments iuclude, but are not liinited to, 
promoter sequences, euhancer sequences, gene modulating sequences and gene termination 
sequences. Such fiagments are usefiii in controlling heterologous gene e?q)ression and in 
developing screens to identify gen&modulating agents. A promoter can readily be identified as 
being 5* to the ATG start site in ^e genomic sequence provided in Figure 3. 

A firagmeut conq^rises a contiguous nucleotide sequence greater tiian 12 or more 
nucleotides. Further, a firagment could at least 30, 40, 50, 100, 250 or 500 nucleotides in length. 
The length of the firagment will be based on its intended use. For exanq>le, the fiagment can encode 
q}itopebearingregiGnsofthepeptide,or can be usefiil as DNA probes and primers^ Such 
fiagments can be isolated using tiie known nucleotide sequence to synthesize an oHgonucleotide 
probe. A labded pn>be can tiien be used to screen a cDNA library, genoniic DNA library, or 
mRNA to isolate nucldc add corresponding to flie coding region. Furtiier, primers can be used in 
PGR reactions to clone specific regions of gene. 

A probe^mmer typically comprises substantially a purified oligonucleotide or 
oligonucleotide pair. The oligonucleotide typically cozEq)rises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, 20, 25, 40, 50 or more consecutive 
nucleotides. 

Qrfiiologs, homologs, and allelic variants can be identified using metiiods well known in the 
art As described in flie P^tide Section, these variants conq>rise a nucleotide sequence encoding a 
pq>tide that is typically 60-70%, 70-80%, 80-90%, and more typically at least about 90-95% or 
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more homologous to the nucleotide sequence shown in the Figure sheete or a fragment of this 
sequence. Such nucleic add molecules can readily be identified as being able to hybridize under 
moderate to stringent conditions, to the nucleotide sequmce shown in &e Figure sheets or a 
firagment of the sequence. Allelic variants can readily be determined by genetic locus of &e 
5 encoding gene. 

As used herein, the temi **hyhridizes under stringent conditions" is intended to describe 
conditions for hybridization and washing under which nucleotide sequences encoding a p^tide at 
least 60^70% homologous to each other typically remain hybridized to each oth^. The conditions 

1 0 can be such that sequences at least about 60%, at least about 70%, or at least about 80% or more 
homologous to each other typically remain hybridized to each other. Such stringent conditions are 
known to those skilled ia &e art and can be found in Current Protocols in Molecular Biology^ John 
Wiley & Sons, N.Y. (1989), 63.1-63.6. One exanq>le of stringent hybridization conditions are 
hybridization in 6X sodium diloride/sodium citrate (SSC) at about 45C, followed by one or more 

1 5 washes in 0.2 X SSC, 0. 1% SDS at 50-65C. Exazrples of moderate to low stringency hybridization 
conditions are well known in the art 

Nucleic Acid Molecule Uses 

The nucleic acid molecules of the present invention are usefol for probes, primers, chemical 
20 intermediates, and in biological assays. The nucleic acid molecules are usefol as a hybridization 
probe for messenger RNA, transcrq>t/cDNA and genomic DNA to isolate foU-length cDNA and 
genomic clones encoding the p^tide described in Figure 2 and to isolate cDNA and genomic 
clones that correspond to variants (alleles, oithologs, etc.) producing the same or related p^tides 
shown in Figure 2. As indicated in Figure 3, SNPs, including insertion/deletion polymorpbisons 
25 Cindels*'), were identified at 69 difTerent nucleotide positions in and around the gene encoding die 
transporter protein of the present inventiorL 

The probe can correspond to any sequence along the entire length of the nucleic acid 
molecules provided in the Figures. Accordingly, it could be derived fiom 5' noncoding regions, tfie 
coding region, and 3' noncoding regions. However, as discussed, fi:agments arenot to be constmed 
30 as encompassing fiagments disclosed prior to the present invention. 

The imcleic add molecules are also usefiil as primers for PGR to anqilify any given region 
of a nucleic acid molecule and are usefiil to synthesize antisense molecules of desired length and 
sequence. 
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The nucleic add molecules are also useful for constructing recombinant vectois. Such 
vectors include expression vectors that express a portion of^ or all o^ the peptide sequences. 
Vectors also include insertion vectois, used to integrate into anothor nucleic acid molecule 
sequence, such as into the cellular genome, to alter in situ expression of a gene and/or gme product 
For exanq>le, an endogenous coding segumce can be replaced via homologous recombination with 
all or part of the coding region containing one or more specifically introduced mutations. 

The nucldc acid molecules are also usefid for expressing antigenic portions of the proteins. 

The nucldc add molecules are also useful as probes for detemiining the chromosomal 
positions of the nucleic add molecules by means of in situ hybridization methods. The gene 
provided by the present invention is located on a genome component that has been mapp ed to 
human diromosome 4 (as indicated in Figure 3), whidi is supported by multiple lines of evidence, 
such as STS andBAC m^ data. 

The nucleic add molecules are also useful in malring vectors containing the gene regulatory 
regions of Hig nucldc add molecules of the present invention. 

The nucldc acid molecules are also useful for designing ribozymes corresponding to all» or 
a part, of the mRNA. produced from the nucldc add molecules described hoein. 

The nucldc ^d molecules are also useful for making vectors tiiat express part, or all, of the 
pq>tides. 

The nucldc add molecules are also useful for constmcting host cells expressing a part, or 
all, of the nucldc add molecules and p^tides. 

The nucldc add molecules are also useful for constructing transgenic anTtnalQ expressing 
all, or apart, oflfae nucldc add molecules and pqrtides. 

The rmddc add molecules are also useftd as hybridization probes for detennii^ 
presence, levd, form and distribution of nucldc add esqirBssion. Experimental data as provided in 
Figure 1 indicates tiiat protease protdnsofihe present invention are expressed in humans in testis, 
placenta, fetal lung, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancels. Specifically, a 
virtual northern blot shows expression in cancers. In addition, PCR-based tissue screening panels 
indicate expression in testis, placenta, fetal lung, f^ kidney, fetal heart, fetal brain, and bone 
marrow. Accordirig^, the probes can be used to d^ect the presence o£ or to detemiine levels o^ a 
specific nucldc add molecule in cells, tissues, and in organisms. The nucldc add whose levd is 
detemiined can be DNA or KNA. Accordingly, probes corresponding to the peptides described 
herein can be used to assess expiesdon and/or gene copy number in a given cell, tissue, or 
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oi^ganism. These uses are relevant for diagnosis of disorders involving an increase or decrease in 
protease protein expression relative to normal results. 

In vitro techniques for detection of mRNA include Northern hybridizations and m situ 
hybridizations. In vitro techniques for detecting DNA includes Southern hybridizations and in situ 
5 hybridizatioiL 

Probes can be used as apart of a diagnostic test kit for identifying cells or tissues that 
express a protease protein, such as by measuring a level of a piotease-encoding nucleic add in a 
sample of cells from a subject e.g., mRNA or genomic DNA, or determining if a protease gene has 
been mutated. Experimental data as provided in Figure 1 indicates that protease proteins of die 

10 present inveation are eTqniessed in hxmians in testis, placen^ fetalheart, 
fetal brain, bone marrow, and in cancers. Spedficalfy, a virtual northern blot shows expression in 
cancers. In addition, PCR-based tissue screening panels indicate expression in testis, placenta, fetal 
lung, fetal kidney, fetal heart, fetal brain, and bone marrow. 

Nucldc add expression assays are useful for dmg screening to identify compounds that 

15 modulate protease micldc acid expression. 

The inventi on thus provides a method far identifying a compound that can be used to treat a 
disorder associated wilh nucleic add expression of the protease gene, particularly biological and 
pathological processes that are mediated by the protease in cells and tissues that esqiress it 
Ejcpeiimental data as provided in Figure 1 indicates expression in humans in testis, placenta, fetal 

20 hmg, fetal kidney, fetal heart, fetal brain, bone marrow, and in cancers. The method typically 

includes assaying the ability of the compound to modulate the expression of the protease nucldc 
add and thus identifying a conq>oimd that can be used to treat a disorder characterized by undesired 
protease nucldc add e^qsiession. The assays can be performed in cell-based and cell-free systems. 
Cell-based assays include cells naturally e}q)]:essmg&e protease micldc add or recombiaant cells 

25 genetically engineered to express specific nucldc add sequences. 

The assay for protease nucldc add esqnession can involve direct assay of nucldc add 
levels, sudi as noRNA levels, or on collateral cooD^unds involved in the dgnal pafl^ Further, 
the expression of genes ttiat are up- or down-regulated in response to the protease protein signal 
pathway can also be assayed. In this embodiment the regulatory regions ofthese genes can be 

30 operably linked to a reporter gene such as ludferase^ 

Thus, modulators of protease gene expression can be identified in a method wherein a cell is 
contacted with a candidate compound and the expression of mRNA delemtm Thelevelof 
expression of protease mRNA in the presoace of the candidate conipound is con^ared to the level 
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of exprression of protease mRNA in the absence of the candidate compound. The candidate 
compound can then be identified as a modulator f nucleic acid expression based on this 
con^aiison and be used, for exaniple to treat a disorder characterized by aberrant nucleic acid 
expression. When expression of mRNA is statistically significantly greater in the presence of the 
candidate conxpound than in its absence, the candidate compound is identified as a stimulator of 
nucleic add expression. Whm nucleic acid expression is statisticaUy sigmficantly less in the 
presence of the candidate con:qx)und than in its absence, the candidate conq>ound is identified as an 
inhibitor of nucleic add expression. 

The invention fiirQier provides methods of treatment, with the nucldc acid as a target, using 
a conq>ound identified through drug screening as a gene modulator to modulate protease nucleic 
acid expression in cells and tissues that express fiie protease. E3q>etimmtal data as provided in 
Figure 1 iridicates that protease proteirisofthe present invention are expressed in huxnaiis in t 
placenta, fetal lung, fetal kidney, fetal heart, Jfetal brain, bone marrow, and in cancers. Specifically, a 
virtual northern blot shows expression in cancers. In addition, PCR-based tissue screening panels 
indicate e3q>ression in testis, placenta, fetal lung, fetal kidney, fetal heart, fetal brain, and bone 
marrow. Modulation includes both up-regulation (i.e. activation or agonization) or down-regulafion 
(siq>pression or antagqnization) or nucldc add expressiorL 

Alternatively, a modulator for protease nucldc add e)q>ression can be a small molecule or 
dmg identified using the screening assays described herein as long as the drug or small molecule 
inhibits the protease nucldc add esqiresston in the cells and tissues that e?q)ress the protein. 
E3g;>aimental data as provided in Figure 1 indicates expression in humans in testis, placenta, fetal 
lung, fetal kidiiey, fetal heart, fetal brain, bone xnarrow, andincackcers. 

The micldc add molecules are also useful for monitoring the effectiveness of modulating 
conq)oimds on the eaqstression or activity of die protease gene in clinical trials or in a treatment 
r^jmen. Thus, the gene expresdon pattern can serve as abarometer for the continuing 
effectiveness of treatment with the compound, particularly witti compounds to which a patient can 
develop resistance. The gene eoqiression pattern can also serve as a niarkeriiidicative of a 
physiological response of the affected cells to the compound. Accordingly, such monitoring would 
aUow dtfasr increased administration of the conqiound or the administration of alternative 
compounds to which the patient has not become resistant Similarly, if the level of nucleic acid 
expression &lls below a desirable level, administration of the conq>ound could be commensurately 
decreased. 
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The nucleic acid molecules are also useful in diagnostic assays for qualitative changes in 
protease nucleic acid e?q)rBssion, and particularly in qualitative changes that lead to pathology. The 
nucleic acid molecules can be used to detect mutations in protease genes and gene expression 
products such as mRNA. The nucleic acid molecules can be used as hybridization probes to detect 
naturally occurring genetic mutations in the protease gene and thereby to detennine Tsdiether a 
subject with the mutation is at risk for a disorder caused by the mutation. Mutations include 
deletion, addition, or substitution of one or more nucleotides in the gene, chramosomal 
rearrangement, such as inversion or transposition, modification of genomic DNA, such as aberrant 
meth>iation patterns or changes in gene copy nimiber, such as anq^lification. Detection of a 
mutated form of the protease gene associated witii a dysfunction provides a diagnostic tool for an 
active disease or susc^tibility to disease when the disease results fiom ovexcxpression, 
underexpression, or altered expression of a protease protein. 

Individuals carrying mutations in the protease gene can be detected at the nucleic acid level 
by a variety of techniques^igure 3 provides infomiation on SNPs that have been identified in the 
gene encoding the protease protean of the present invention. SNPs, including indels (indicated by a 
**-")» were identified at 69 different nucleotide positions. Non-synonymous cSNPs were identified at 
position 3(MS^. The changes in the amino add sequmce caused by these SNPs is indicated in 
Figure 3 and can readily be detemiined using the universal genetic code and the protein sequence 
provided in Figure 2 as a reference. SNPs outside the ORF and in introns may afiect 
coutrol/regulatoiy elements. The gene provided by the present invention is located on a gmome 
coixqx)nent that has been ma|>ped to human chromosome 4 (as indicated in Figure 3), ^vv^ch is 
siqjpoited by multiple lines ofeWdence^ such as STS and BAG m^ Genomic DNA can be 
analyzed directly or can be aoqilified by using PGR prior to analysis. ENA or cDNA can be used in 
thesamew^. M some uses, detection ofthe mutation involves tiie use of a probe^rimer in a 
polymerase chain reaction (PGR) (see, e.g. U.S. Patent Nos. 4,683,195 and 4,683,202), such as 
anchor PGR or RACE PGR, or, alternatively, in aligation chain reaction (LGR) (see, e.g., 
Landegran et oL, Science 2^7:1077-1080 (1988); and Nakazawa et oLy PNAS 91-3m-^M (1994)), 
the latter of indiich can be p articulaily useful for detecting point mutations in tiie gene (see Abravaya 
et aL, Nucleic Adds Res. 25:675-682 (1995)). This mrfiod can inchide the stsps of collecting a 
sanqile of ceiDs fixnn a patient, isolating nucleic add (eg., genomic, mRNA or both) fix>m tiie cells 
of tiie san^>le, contacting the nucleic acid sample with one or more primers which specifically 
hybridize to a gene under conditions such that hybridization and amplification of the gene (if 
present) occurs, and detecting the presence or absence of an amplification product, or detecting the 
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size of the amplification product and cofxnpaxmg the length to a control sample. Deletions and 
insertions can be detected by a change in size of the an:q>lified product con^ated to the noiznal 
genotype. Point mutations can be identified by hybridizing amplified DNA to normal RNA or 
antisense DNA sequences. 

Alternatively, mutations in a protease gene can be directly identified, for example, by 
alterations in restriction enzyme digestion patterns detemiined by gel electrophoresis. 

Further, sequence-specific ribozymes (U.S. Patent No. 5,498,53 1) can be used to score for 
the presence of specific mutations by development or loss of a ribozyme cleavage site. Perfectly 
matc hed sequences can be distinguished finom mismatched sequences by nuclease cleavage 
digestion assays or by differences in melting ten:q>erature. 

Sequence dianges at specific locations can also be assessed by nuclease protection assays 
such as RNase and SI protection or the chemical cleavage metiiod. Furthermore, sequence 
differences between a mutant protease gme and a wild-type gene can be deteimined by direct DNA 
sequencing. Avarietyof automated sequencing procedures can be utilized when i>erfoniiing the 
diagnostic assays (Naeve, C.W., (1995) Biotechniques iP:448), including sequencing by mass 
spectrometry (see, e.g., PCX hitemational Publication No. WO 94/16101; C^ohencT ail, 
Chromatogr. 55:127-162 (1996); and GrifBn et aL^Aj^l Biochem, BiotechnoL 35:147-159 (1993)). 

Other methods for detecting mutations in the gene include methods in which protection 
firom cleavage agents is used to detect mismatched bases in KNA/EINA orRNA/DNA duplexes 
(Myers et al,. Science 230:1242 (1985)); Cotton et aL, PNAS 55:4397 (1988); Saleeba et aL, Metk 
EfizymoL 277:286-295 (1992)), electrophoretic mobility of mutant and wild ^e nucleic acid is 
compared (Qrita et al, PNAS 862766 (1989); Cotton et oL, Muiat Bes. 255:125-144 (1993); and 
Hayashi et al,. Genet Anal, Tech. AppL P:73-79 (1992)), and movement of mutant or wild-type 
fi:agmeatB inpoly^aci^amide gels containing a gradient of denaturant is assayed using denaturing 
gradient gel electrophoresis (Myers et a/.. Nature 313:495 (1985)). Exanq)les of other techniques 
for detecting point mutations include selective ohgonucleotide hybridization, selective 
aiqplification, and selective primer extension. 

The nucleic acid molecules are also usefid for testing an individual for a genotype that wUlc 
not necessarily causing the disease, nevertheless affects the treatmmt modality. Thus, the nucleic 
acid molecules can be used to study the relationshqi betweoi an individuates genotype and the 
individual's response to a ccnnpound used for treatment (phannacogenomic relationship). 
Accordingly, the nucleic add molecules described herein can be used to assess the mutation content 
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of the protease gene in an individual in order to select an ap piupr iate conqx>und or dosage regimen 
for treatment 

Thus nucleic add molecules displaying genetic variations that affect treatment provide a 
diagnostic target that can be used to tailor treatment in an individual. Accordingly, the production 
5 of recombinant cells and animals containing these polymorphisms allow effective clinical design of 
treatment compounds and dosage regimens. 

The nucleic acid molecules are thus useful as antisenseconstiucts to control protease gene 
expression in cells, tissues, and organisms. A DNA antisense nucleic acid molecule is designed to 
be con:q>lementaiy to a region of the gene involved in transcription, preventing transcription and 

1 0 hmce production of protease protdn. An antisCTse RNA or DNA nucldc add molecule would 
hybridize to the mORNA and thus block translation of mRNA into protease protein. Figure 3 
provides infotmation on SNPs HaaX have been identified in the gene encoding tiie protease protdn of 
the present invention. SNPs, including indels (indicated by a * were identified at 69 dififerent 
nucleotide positions. Non-synonymous cSNPs were identified at position 30496. The changes in the 

15 amino acid sequence caused by tiiese SNPs is indicated in Figure 3 and can readily be determined 
using the universal genetic code and the protein sequence provided in Figure 2 as a reference. SNPs 
outdde tiie ORF and in inlroiis may affect control^regulatory elements. 

Alternatively, a class of antisense molecules can be used to inactivate mRNA in order to 
decrease ea^iression of protease nucldc aci± Accordingly, tiiese molecules can treat a disorder 

20 characterizedbyabnorrnal or imdesired protease nucldc add ex^^ This technique involves 
cleavage by means of ribozymes containing nucleotide sequences conqilementaiy to one or more 
regions in tibte mRNA that attenuate the ability of the mRNA to be translated. Possible regions 
include coding r^ons and particularly coding regions corresponding to the catalytic and other 
fimctianal activities of the protease protein, such as substrate biruiing. 

25 The nucldc add molecules also provide vectors for gene tfaerqpy in patients containing cells 

ihst are aberrant in protease gene expression. Thus, recombinant cells, ^^ch include the patiezif s 
cells that have been engineered ex vivo and returned to the patient, are introduced into an individual 
where the cells produce the desired protease protein to treat the individuaL 

The invention also enconq>asse3 kits for detecting the presence of a protease nucldc acid in 

30 a biological sample. Higte rimentfll data as provided tn Fipn*^ 1 fndirrfltf^ that prT>tf^ffft jTrrftftins nf 
&e present inv^ition are expressed in humans in testis, placenta, fetal lung, fetal kidney, fetal heart, 
fetal brain, bone manow, and in cancers. Specifically, a virtual northern blot shows expression in 
cancers, hi addition, PCR-based tissue screening panels iT^clirratft eoqiression in testis, placenta, fetal 
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lung, fetal kidney, fetal heart, fetal brain, and bone marrow. For exan:q>le, the kit can con^rise 
reagents such as a labeled or labelable nucleic acid or agent enable of detecting protease nucleic 
acid in a biological sample; means fer determining the amount of protease nucleic acid in the 
san[q)le; and means fer canq>aring the amount of protease nucleic acid in the sample with a standard. 
The con^und or agent can be packaged in a suitable container. The kit can further comprise 
instructions for using tiie kit to detect protease protein mRNA or DNA. 



Nucleic Acid Arrays 

The present invention further provides nucleic acid detection kits, such as arrays or 
micToarrays of nucleic acid molecules that are based on the sequence information provided in 
Figures 1 and 3 (SEQ ID NOS:l and 3). 

As used herein "Arrays** or "Microarrays" refers to an array of distinct polynucleotides or 
oligonucleotides synthe^ed on a substrate, such as paper, nylon or other type of membrane, 
filter, chip, glass slide, or any other suitable solid support In one embodiment, the microarray is 
prepared and used according to the methods desoibed in US Patent 5,837,832, Chee et aL, PCT 
plication W095/11995 (Chee et aL\ Lockhart, D. J. et aL (1996; Nat Biotech. 14: 1675-1680) 
and Schena, M.etal, (1996; Proc. Natl. Acad. Sci. 93: 10614-10619), all of which are 
incorporated herein in their entirety by reference. In other embodiments, such arrays are 
produced by the methods described by Brown et aL, US Patent No, 5,807,522. 

The microarray or detection kit is preferably con^osed of a large number of unique, 
single-stranded nucleic acid sequences, usually either synthetic antisense oligonucleotides or 
fragments of cDNAs, fixed to a solid siq>port The oligonucleotides are preferably about 6-60 
nucleotides in length, more preferably 15-30 nucleotides in length, and most preferably about 20- 
25 nucleotides in length. For a certain type of microarray or detection kit, it may be preferable to 
use oUgonucleotides tiiat are only 7-20 nucleotides in length. The microanay or detection kit 
may contain oligonucleotides that cover the known 5*, or 3', sequence, sequential 
oligonucleotides which cover tiie fiiU length sequence; or unique oligonucleotides selected from 
particular areas along tiie loigth of the sequence. Polynucleotides used in the miciDarray or 
detection kit may be oligonucleotides that are specific to a gene or genes of interest 

In order to produce oligonucleotides to a known sequence for a microarray or detection 
kit, the gene(s) of interest (or an ORF identified from the contigs of flie present invention) is 
typically examined using a con:q)uter algorithm which starts at the 5' or at the 3' end of the 
nucleotide sequence. Typical algorithms will then identify oligomers of defined length that are 
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unique to the gene, have a GC content within a range suitable for hybridization, and lack 
predicted secondary structure that may interfere with hybridization. In certain situations it may 
be appropriate to iise pairs of oligonucleotides on a microarray r detection kit. The '"pairs" will 
be identical, exc^t for one nucleotide that preferably is located in the center of the sequence. 
The secx>nd oligonucleotide in the pair (mismatched by one) serves as a control. The number of 
oligonucleotide pairs may range j&om two to one million. The oligomers are synthesized at 
designated areas on a substrate using a light-directed chemical process. The substrate may be 
paper, nylon or other type of membrane, filter, chip, glass slide or any other suitable solid 
siqjport 

In another aspect, an ohgonucleotide may be synthesized on the surfece of the substrate 
by using a chemical coupling procedure and an ink jet application apparatus, as described in PCT 
appUcation W095/25 1 1 16 (Baldeschweilcr et oL) which is incorporated herein in its entirety by 
reference. In another aspect, a "gridded" array analogous to a dot (or slot) blot may be used to 
arrange and link cDNA fragments or oligonucleotides to the sur&ce of a substrate using a 
vacuum system, thennal, UV, mechanical or chemical bonding procedures. An array, such as 
those described above, may be produced by hand or by using available devices (slot blot or dot 
blot apparatus), mat^als (any suitable solid siq)port), and machines Occluding robotic 
instruments), and may contain 8, 24, 96, 384, 1536, 6144 or more oHgonucleotides, or any other 
number between two and one million which lends itself to the efficient use of commerciaUy 
available instrumentatioiL 

In ordo- to conduct sample analysis using a microarray or detection kit, the RNA or DNA 
fiom a biological san^le is made into hybridization probes. The mRNA is isolated, and cDNA is 
produced and used as a teiiq)late to make antisense RNA (aRNA). The aRNA is amplified in the 
presence of fluorescent nucleotides, and labeled probes are incubated with the microarray or 
detection kit so that flie probe sequences hybridize to complementary oligonucleotides of the 
microarray or detection kit Incubation conditions are adjusted so that hybridization occurs with 
precise con:q>lementary matches or with various degrees of less con^lementarity. After removal 
of nonhybridized probes, a scanner is used to determine the levels and patterns of fluorescence. 
The scaimed images are examined to determine degree of conq)lementarity and the relative 
abundance of each oligonucleotide sequence on the microarray or detection kit The biological 
sanqiles may be obtained fiom any bodily fluids (such as blood, urine, saliva, phlegm, gastric 
juices, etc.), cultured cells, biopsies, or other tissue preparations. A detection system may be 
used to measure flie absence, presence, and amount of hybridization for all of the distinct 
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sequences simultaneously. This data may be used for large-scale correlation studies on the 
sequences, expression patterns, mutations, variants, or polymoxphisms among samples. 

Using such arrays, the present invention provides methods to identify the expression of 
the protease proteins/peptides of the present invention. Jn detail, such methods comprise 

« 

incubating a test sample with one or more nucleic acid molecules and assaying for binding of the 
nucleic acid molecule with components within the test sample. Such assays will typically 
involve arrays conqnising many genes, at least one of which is a gene of the present invention 
and or alleles of the protease gene of the present invention. Figure 3 provides information on 
SNPs that have been identified in tiie gene encoding the protease protein of the present 
invention. SNPs, including indels (indicated by a "-**), were identified at 69 different nucleotide 
positions. Non-synonymous cSNPs were identified at position 30496. The changes in the amino 
acid sequence caused by these SNPs is indicated in Figure 3 and can readily be determined using 
the universal genetic code and the protein sequence provided in Figure 2 as a reference. SNPs 
outside the ORF and in introns may affect control/regulatory elements. 

Conditions for incubating a nucleic acid molecule with a test sample vary. Incubation 
conditions d^>end on the format enq>loyed in the assay, the detection methods employed, and the 
type and nature of the nucldc acid molecule used in the assay. One sldlled in the art will 
recognize that any one of the commonly available hybridization, anq>lification or array assay 
formats can readily be adapted to eixq>loy the novel fragments of the Human genome disclosed 
herein. Examples of such assays can be foimd in Chard, T, An Introduction to 
Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The 
Nettierlands (1986); BuUock, G. R. e/ a/.. Techniques in Immunocytochemistry, Academic 
Press, Orlando, FL VoL 1 (1 982), Vol. 2 (1983), VoL 3 (1985); Tijssen, P., Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular 
Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985). 

The test sanq>les of the present invention include cells, protein or membrane extracts of 
cells. Thetestsampleusedintheabove-describedmetfaodwill vary based on the assay format, 
nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. 
Mettiods for preparing nucleic acid extracts or of cells are well known in tiie art and can be 
readily be ad^jted in order to obtain a sarrxple that is compatible with the system utilized. 

In anothCT embodiment of the present invention, kits are provided which contain the 
necessary reagents to carry out the assays of the present inventioiL 
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Specifically, the invention provides a compartmentalized kit to receive, in close 
confinement, one or more containers which comprises: (a) a first container con4>rising one of the 
nucleic acid molecules that can hind to a fragment of the Human genome disclosed herein; and 
(b) one or more other containers comprising one or more of the following: wash reagents, 
5 reagents capsble of detecting presence of a bound nucleic acid. 

Jn detail, a compartmentalized kit includes any kit in ^^ch reagents are contained in 
separate containers. Such containers include small glass containers, plastic containers, strips of 
plastic, glass or peqper, or arraying material such as silica. Sucdi containers allows one to 
efficiently transfer reagents from one compartment to another compartment such that the 
1 0 samples and reagents are not cross-contaminated, and the agents or solutions of each container 
can be added in a quantitative ^sfaion fiom one compartment to another. Such containers will 
include a container ^lich will accept the test sample, a container which contains the nucleic acid 
probe, containers which contain wash reagents (such as phosphate bufBered saline, Tris-buffers, 
etc.), and containers which contain the reagents used to detect the bound probe. One skiUed in 
1 5 the art will readily recognize that the previously unidentified protease gene of the present 
invention can be routinely identified using the sequence information disclosed ho^ein can be 
readily incorporated into one of the established kit formats which are well known in the art, 
particularly expression arrays. 

20 Vectors/host cells 

The invention also provides vectors containing the nucleic acid molecules described herein. 
The tenn "vector" refers to a vehicle, preferably a nucleic add molecule, which can transport the 
nucleic add molecules. When the vector is a nucldc add molecule, the nucldc acid molecules are 
covalmtty linked to the vector nucldc acid. With this aspect oftfae invention, the vector includes a 
25 plasmid, single or double stranded phage, a single or double stranded KNA or DNA viral vector, or 
artiBcial chromosome, such as a BAC, PAC, YAC, OR MAC. 

A vector can be maintained in the host cell as an extracfaromosomal element where it 
r^Iicates and produces additional copies of the nucldc add molecules. Attematively, the vector 
may integrate into the host cell genome and produce additional copies of ttie micldc add molecules 
30 i^en &ie host cell repUcates. 

The invention provides vectors for the maintsiance (cloning vectors) or vectors for 
expression (expression vectors) of the nucldc acid molecules. The vectors can function in 
prokaryotic or eukaryotic ceUs or in both (shuttle vectors). 

45 



wo 02/26947 



PCT/USOl/29960 



Expression vectors contain ds-acting regulatory regions that are operably linked in the 
vector to the nucleic add molecules such that transcription of the nucleic acid molecules is allowed 
in a host celL The nucleic add molecules can be introduced into the host cell with a separate 
nucldc add molecule capMe of afTecting transcription. Thus, the second nucleic acid molecule 
5 may provide a trans-acdng &ctor interacting with the cis-regulatoiy control region to allow 

transcription of the nucldc add molecules finom &e vector. Alternatively, a trans-acting factor may 
be sillied by the host cdl. Finally, a trans-acting &ctor can be produced from the vector itself It 
is understood, however, that in some embodiments, transcription and/or translation of the nucldc 
add molecules can occur in a cell-free system. 

1 0 The legulatoiy sequence to ^i^ch the nucldc acid molecules described herdn can be 

operably linked include promoters fi>r directing mRNA transcription. These include, but are not 
limited to, tiie left promoter fiom bacteriophage X, the lac, TRP, and TAG promoters fiom E, colU 
the early and late promoters fiom SV40, the CMV immediate early promoter, the adenovirus early 
and late promoteis, and retrovirus long-terminal repeats. 

15 In addition to control regions that promote transcription, expression vectors may also 

include regions that modulate traiiscr^oii, sucb as repressor binding dtes and er^^ 
Examples include fee SV40 chancer, the cytomegalovirus immediate early eao^iancer, polyoma 
enhancer, adenovirus enhancers, and retrovirus LT1R. enhancers. 

In addition to containing sites for transcription initiation and control, expression vectors can 

20 also contain sequences necessary transcription tennination and, in the transcri 

ribosome binding site for translation. Other regulatory control elements for expression include 
initiation and termination codons as well as polyadenylation signals. The person of ordinary skill in 
the art would be aware of die numerous regulatory sequences that are useful in expression vectors. 
Such regulatory sequences are described, for example, in S ambrook et oL , Molecular Clomn g: A 

25 Laboratory Manual. 2ruL Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 
(1989). 

A variety ofesqjiesdon vectors can be used to express a nucldc add molecule. Such 
vectors include chromosomal, episomal, and virus-derived vectors, for €xanq>le vectors derived 
fiom bacterial plasmids, fiom bacteriophage, fioin yeast episomes, fiom yeast chromosomal 
30 elements, including yeast artifidal dnomosomes, fiom viruses such as baculoviruses, 

ps^vaviruses such as SV40, Vaccinia viruses, adenoviruses, poxviruses, pseudorabies viruses, and 
retroviruses. Vectors may also be derived fixim combinations of these somoes such as those derived 
fiom plasmid and bacteriophage genetic elements, e.g cosmids and phagemids. Appropriate 
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cloning and expression vectors for prokaryotic and eukaiyotic hosts are desoibed in Sambrook ei 
oLy Molecular Cloning: A Laboratory Manual, 2nd. ed,^ Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, NY, (1989). 

The regulatory sequence may provide constitutive expression in one or more host cells (i.e. 
5 tissue specific) or may provide for inducible expression in one or more cell types such as by 

temperature, mxtrient additive, or exogenous fector such as a hormone or other ligand. A variety of 
vectors providing far constitutive and inducible expression in prokaryotic and eukaryotic hosts are 
well known to tibose of ordinary skill in the art 

The nucleic acid molecules can be inserted into the vector nucleic add by well-known 

10 methodology. GeneraUy, the DNA sequence that wiUultiniately be expressed is joined to an 
expression vector by cleaving the DNA sequmce and the eTqjression vector with one or more 
restriction enzymes and then ligating the fragments toge&er. Procedures for restriction enzyme 
digestion and ligation are well known to those of ordinary skill in the art 

The vector containing tiie ^>propriate nucleic add molecule can be introduced into an 

15 ^propriate host cell for propagation or expression using well-^own techniques. Bact^al cells 

include, but are not limited to, coU, Str^tomyces, and Salmonella typhimurium, EukaryDtic cells 
include, but are not limited to, yeast, insect ceUs sadtiBsDrosophilay animal cells such as COS and 
CHO cells, and plant cells. 

As described herein, it rru^ be desirable to express the peptide as a fusion protdrL 

20 Accordingly, the invention provides fusion vectors Hiat allow for the production of the peptides. 
Fusion vectors can increase &e expresaon of a recombinant protein, increase the solubihty of the 
recombinant piDteiri, arKi aid in the purification of the protein by actirig for exarry leas a hg^^ 
aCBnity purification. Aproteolytic cleavage site xn^be introduced at thejunction of foe fusion 
moiety so that the desired peptide can ultimately be separated fitnn foe fiision moiety. Proteolytic 

25 enzymes irKslude, but are not Hnuted to, fftCtorXa, forombin, arid entenspr^ Typical fiision 
expression vectors include pGEX (Smifo et al.^ Gene 57:31-40 (1988)), pMAL (New England 
Biolabs, Beverly, MA) and pRIT5 (Phannada, Piscataway, K]) which fuse glutafoione S- 
transferase (GST), noaltose E binding protein, or jmytein A, respectively, to foe target Tecomfainant 
pioteirL Examples of suitable iiKlucfole noDrfusion£. coH expression vectors include pTrc (Amarm 

30 et oL, Gene 69:301-315 (1988)) andpET lid (Studier et oL, Gene Expression Technology: Methods 
in Enzymology 185:60-89 (1990)). 

Recombinant protein expression can be maximized in host bacteria by providing a genetic 
background \^erein foe host cell has an iir^aired edacity to proteolytically cleave foe recombinant 
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protem, (Gottesman, S., Gene Expression Technology: Methods in Enzytnology 1 85, Academic 
Press, San Diego, Caliibnua (1990) 1 19-128). Alternatively, the sequence of the nucleic acid 
molecule of interest can be altered to provide preferential codon usage for a specific host ceU, for 
exanqjle E. colL (Wada et aL, Nucleic Adds Res. 20:21 11-2118 (1992)). 

The nucleic acid molecules can also be e?q>ressed by expression vectors that are operative in 
yeast Examples of vectors for expression in yeast e.g., S. cerevisiae include pYq)Secl (Baldaii, et 
aL, EMBOJ. 5:229-234 (1987)), pMFa (Kurjan ef a/., CeZZ 50:933-943(1982)), pJRY88 (Schultz et 
al.^ Gene 5^:1 13-123 (1987)), and pYES2 (Invitrogen Corporation, San Diego, CA). 

The nucleic acid molecules can also be expressed in insect cells using, for example, 
baculovirus expression vectors. Baculovirus vectors available for expression of proteins in cultured 
insect cells (e.g., Sf 9 cells) mclude the pAc series (Smith etaL^Mol CeU Biol i:2156-2165 
(1983)) and the pVL sories (Lucklow et oL, Virology 770:31-39 (1989)). 

In obtain embodiments of the invention, the nucleic acid molecules described herein are 
expressed in mammalian cells using mammalian expression vectors, Exanq)les of mammalian 
expression vectors include pCDMS (Seed, B. Nature 52P:840(1987)) and pMT2PC (Kaufman et aJ,^ 
EMBOJ, <y:187-195 (1987)). 

The expression vectors listed herein are provided by way of example only of the well- 
known vectors available to those of ordinary skill in &e art-that would be usefiil to express the 
nucleic add molecules. The person of ordinary skill in the art would be aware of otho* vectors 
suitable for maintenance propagation or expression of the nucleic add molecules described herein. 
These are found for exan^le in Sanihrook, J., Fritsh, £. F., and Maniatis, T. Molectdar Cloning: A 
Laboratory Manual 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989. 

The invention also eocon^asses vectors in which the nucldc add sequences described 
herein are doned into the vector in reverse orientation, but operably linked to a regulatory sequence 
that permits transcr^ytion of antisense RNA Thus, an antisense transcript can be produced to all, or 
to a portion, of the nucleic add molecule sequences described herein, including both coding and 
noiHcoding regions. E}q)ression of Ifais antisense RNA is subject to each of the parameters 
described above in relation to expresdon of the sense RNA (regulatory sequences, constitutive or 
inducible eoqnessian, tissu&^spedfic expression). 

The invoition also relates to recombinant host cells containing the vectors described herein. 
Host cells flierefore include prokaryotic cells, lower eukaryotic cells such as yeast, other eukaryotic 
cells such as insect cells, and higher eukaryotic cells such as mammalian cells. 
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The recombinant host cells are prq)ared by introducing th^ 
heran into the cells by techniques readily available to &e person of ordinary skill in the art These 
include, but are not limited to, calcium phosphate tiansfection, DEAE-deTCtran-mediated 
transfection, cadonic lipid-mediated transfection, electroporation, transduction, infection, 
5 lipofection, and other techniques such as those found in Sanfibroolf; et al. (Molecular Cloning: A 
Laboratory ManuaL 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989). 

Host cells can contain more than one vector. Thus, different nucleotide sequences can be 
introduced on difTraiGnt vectors of the same cell. Similarly, the nucleic add molecules can be 
1 0 introduced either alone or with other nucleic acid molecules that are not related to the nucleic acid 
molecules such as those providing trans-acting &ctors for expression vectors. When more than one 
vector is introduced into a cell, ttie vectors can be introduced ind^>eQdently, co-introduced or joined 
to the nucleic acid molecule vector. 

In tiie case of bacteriophage and viral vectors, these can be introduced into cells as packaged 
15 or enc^sulated virus by staiidard procedures for infoction and transductian. Viral vectors can be 
replication-<x»nq>eteat or rq>]icatiorwiefective. In the case in \^ch viral r^lication is defective, 
replication will occur in host cells providing functions that complemgnt the defects. 

Vectors generally include selectable markers that enable the selection of the subpopulation 
of cells that contain the recombinant vector constmcts. Hiemaxk^ can be contained in Sesame 
20 vector that contains the nucleic acid molecules described herein or may be on a separate vector. 
Markers include tetracycline or ampicillin-resistance genes for prokaryotic host cells and 
dihydrofolate reductase or neomycin resistance for eukaryotic host cells. However, any marker that 
provides selection for a phmotypic trait will be effective. 

While the mature proteins can be produced in bactma, yeast, mammalian cells, and other 
25 cells under the control of the ^>propriate regulatory sequences, cell- fiee transcription and 

- translation systems can also be used to produce these proteins using RNA derived fiom the DNA 
constructs described herein. 

Where secretion of flie pq)tide is desired, v/inch is difficult to achieve with multi- 
transmembrane domain containing proteins such as proteases, appr o pri ate secretion signals are 
30 incorporated into the vector, llie sigruQ sequence can be eudogmous to die pqitides or 
heterologous to these pqptides. 

Where tiie peptide is not secroted into the medium, which is typically tiie case with 
proteases, fiie protein can be isolated fiom tiie host cell by standard disnqition procedures, including 
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freeze tfaaw, scmication, mechamcal disnq)tion, use of lysing agmts and the like. The peptide can 
then be recovered and purified by well-known purification methods including ammonium suL^ 
precipitation, add extraction, anion or cationic exchange chromatogr^hy, phosphocellulose 
chromatogr^hy, hydrophobic-interaction chromatograq>hy, afi5nity chromatogrs^hy, 
hydioryi£^atite chromatography, lectin diromatogrEq)hy, or high performance Uquid 
chromatography 

It is also understood that depending i^on &e host cell in recombinant production of the 
pq>tides described hearein, the peptides can have various glycos>dation patterns, depending upon the 
cell, or maybe nonrgjycosylated as when produced in bacteria. Li addition, the pq>tides may 
include an initial modified methionine in some cases as a result of a host-mediated process. 

Uses of vectors and host cells 

The recombinant host cells expressing the peptides described herein have a variety of uses. 
First, the cells are usefld for producing a protease protein or peptide that can be 
produce desired amounts of protease protdn or firagments. Thus, host cells containing expression 
vectors are useful far p^tide production. 

Host cells are also useful for conducting cell-based assays involving the protease protein or 
protease protein fragments, such as those described above as well as other formats known in the art. 
Thus, a recombinant host cell expressing a native protease protein is useful for assaying con^unds 
that stimulate or inhibit protease pmtein functiorL 

Host cells are also useful far identifying protease protein mutants in which these functi ons 
areaffected. Ifthe mutants naturaUy occur and give rise to a pathology, host celbcontairiing&e 
mutations are useful to assay coic^unds that have a desired e£fect on fte mutant protease protein 
(fiMT exanq>le^ stimulating or inhibiting fimction) ^mdnch may not be indicated by their efTect on the 
native protease protdn. 

Gmetically engineered host cells can be fiirth^ used to produce non-human transgenic 
animals. A transgenic animal is preferably a marrmi al, for exany le a rodent, such as a rat or mouse, 
in wfaidi one or inoreofthe cells offhe animal include a transgene. A transgene is exogenous DNA 
^^ch is integrated into the genome of a cell from ^^cfa a transgenic animal develops and which 
remains in flie goiome of &e mature animal in one or more cell types or tissues of the transgenic 
animal These a nimals are useful for studying the fimction of a protease protein and identifying and 
evaluatmg modulators of protease protein activity. Other exai]:q>les of transgenic animals include 
non-human primates, sheep, dogs, cows, goats, chickens, and amphibians. 
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A transgenic animal can be produced by introducing nucleic acid into the male pronuclei of 
a fertilized oocyte, e.g., by microinjection, letroviral infection, and allowing tiie oocyte to develop 
in a pseudopregnant female foster animal. Any of the protease protein nucleotide sequences can be 
introduced as a transgene into the genome of a non-himian animal, such as a mouse, 
5 Any of the regulatory or other sequences usefiil in expression vectors can form part of the 

transgenic sequence. This includes intronic sequences and polyadenylation signals, if not already 
included. A tissue-specific regulatory sequence(s) can be operably linked to the transgene to direct 
expression of the protease protein to particular cells. 

Methods for generating transgenic aniTnals via embryo manipnlation and microinjection, 

1 0 particularly nntmalg such as mice, have become conventional in the art and are described, for 
example, in U.S. Patent Nos. 4,736,866 and 4,870,009, both by Ledo: et al,^ U.S. Patent No. 
4,873,1 91 by Wagno* et oL and in Hogan, B., Manqndatmg the Mouse Embryo^ (Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986). Similar methods are used for 
production of o^er transgenic animals. A transgenic founder animal can be identified based upon 

15 the presence of the transgene in its genome and/or expression of transgenic mRNA in tissues or 

cells of the animals. A t^my^g^^'^ frwinder nrttinal can then he wsieA to breed additional animals 

carrying the transgene. Moreover, transgenic aniiiuilscanying a trarisgene can fiirther be bred to 
other transgenic gnfmaiR carrying other transgenes. A transgenic animal also includes animals in 
which the entire animal or tissues in the animjd have been produced using Uie homologously 

20 recombinant host cells described herein. 

In another embodiment, transgenic non-human flnimalg can be produced ^uch contain 
selected systems that allow for regulated expression of die trarisgene. One example of such a 
system is the cre/IaxPreconibiriasesystan of bacteriophage PI ^ For a desci^on of die cre^oxP 
recombinase system, see, e.g., Lakso et al PNAS 89:6232-6236 (1992). Anotho- cxmnplc of a 

25 recombinase system is the FLP recombinase system of £ cerevisiae (fyGarmm et al. Science 
251:1351-1355 (1991). U^cre/laxP recombinase system is used to regulate esqnession of the 
transgene, anwnftlR containing transgenes encoding both die Cre recombinase and a selected protein 
is required. Such can be provided Ihrou^ the constraction of "double" transgenic animals, 

e.g., by mating two transgenic animals, one containing a transgene encoding a selected protein and 

30 die oQxer containing a transgene encoding a recombinase. 

Clones of the non-human transgenic animal g described herein can also be produced 
according to the methods described in Wiimut, L el ail Nature 585:810-813 (1997) and PCT 
Ihteniational Publication Nos. WO 97/07668 and WO 97/07669. In brie^ a cell, eg., a somatic cell. 
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fiom the transgenic animal can be isolated and induced to exit the growth cycle and enter Go phase. 
The quiescent cell can then be iiised, e.g., througji the use of electrical pulses, to an enucleated 
oocyte fiom an animal f the same species from which the quiescent cell is isolated. The 
reconstructed oocyte is thm cultured such that it develops to morula or blastocyst and then 
transferred to pseudopregnant female foster animal The ofi&piing bom of this female foster animal 
will be a clone of the animal fixnn which ttte cell, e.g.» the somatic cell, is isolated. 

Transgenic animals containing recombinant cells that express the peptides described herein 
are usefld to conduct the assays described herein in an in vhvcon^ Accordingly, the various 
physiological &ctors that are presmt in vivo and that could effect substrate binding, protease protein 
activity/activation, and signal transduction, may not be evident from in vitro ceU-free or cell-based 
assays. Accordingly, it is useful to provide non-human transgenic jmlmalQ to assay in vivo protease 
protein function, including substrate interaction, the effect of specific mutant protease protdns on 
protease protein fimction and substrate interaction, and the effect of chimeric protease proteins. It is 
also possible to assess the effect of null mutations, that is mutations that substantially or completely 
eliminate one or more protease protein functions.. 

All publications and patents mentioned in die above specification are herein incorporated 
by reference. Various modifications and variations of the described method and system of the 
invention will be apparent to those skilled in the art without departing from the scope and spirit 
of the invention. Although the invention has been described in connection with specific 
preferred embodiments, it should be understood that the invention as claimed should not be 
unduly limited to such specific embodiments. Indeed, various modifications of the above- 
described modes for carrying out the invention which are obvious to those skilled in the field of 
molecular biology or related fields are intended to be within the scope of the following claims. 
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Claims 

That whidi is claimed is: 

1 . An isolated peptide consisting of an amino acid sequence selected from the group 
consisting of: 

(a) an amino acid sequeace shown in S£Q ID NO;2; 

(b) an amino acid sequence of an allelic vaiiant of an amino acid sequence 
shoAvn in SEQ ID NO:2, wherein said allelic variant is encoded by a nucleic acid molecule that 
hybridizes under stringmt conditions to the opposite strand of a nucleic add molecule shown in 
SEQIDNOS:lor3; 

(c) an amino acid sequence of an ordiolog of an amino add sequence shown in 
SEQ ID NO:2, herein said ortholog is encoded by a nucldc add molecule that hybridizes under 
stringent conditions to ttie opposite strand of a nucldc add molecule shown in SEQ ID NOSrl or 3; 
and 

(d) a fragment of an amino acid sequence shown in SEQ ID NO:2, wherein said 
fragment comprises at least 10 contiguous amino acids. 

2. An isolated peptide comprising an amino acid sequence selected from the group 
consisting ofr 

(a) an amino add sequence shown in SEQ ID NO:2; 

(b) an amino add sequence of an allelic variant of an amino add sequence 
shown in SEQ ID NO:2, ^^erein said allelic variant is encoded by a nucldc add molecule that 
hybridizes under stxingmt conditions to &e opposite strand of a nucldc add molecule shown in 
SEQ ID NOSrl or 3; 

(c) an amino add sequoice of an ortholog of an amino add sequence shown in 
SEQ ID NO:2, herein said ortholog is encoded by a nucldc add molecule that hybridizes under 
stringent conditions to the opposite strand of a nucldc acid molecule shown in SEQ ID NOS:l or 3; 
and 

(d) a fragment of an amino add sequence shown in SEQ ID NO:2, wherein said 
fragment comprises at least 10 contiguous amino adds. 

3. An isolated antibody that selectively binds to apeptide of claim 2. 
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4. An isolated nucleic add molecule consisting of a nucleotide sequence selected ftom 
the groi^ consistzng of: 

(a) a nucleotide sequence that encodes an amiao acid sequence shown in SEQ 

ID NO:2; 

(b) a nucleotide sequence that encodes of an allelic variant of an amino acid 
sequence shown in SEQ ID NO:2, herein said nucleotide sequence hybridizes under stringent 
conditions to the opposite strand of a nucleic acid molecule shown in SEQ ID NOS: 1 or 3; 

(c) a nucleotide sequence that encodes an orthologofan amino acid sequence 
shown in SEQ ID NO:2, wherein said nucleotide sequence hybridizes under stringmt conditions to 
the opposite strand of a nucleic acid molecule shown in SEQ ID NOS:l or 3; 

{d) a micleotide sequence that encodes a £:agment of an amino acid sequence 
shown in SEQ ID NO:2, wherein said fi:agment comprises at least 1 0 contiguous amino acids; and 

(e) a nucleotide sequCTtce that is the complement of a nucleotide sequence of 



5. An isolated nucleic add molecule conq>rising a nucleotide sequence selected fiom 
the groi^ consisting of: 

(a) a nucleotide sequence tiiat encodes an amino acid sequence shown in SEQ 

IDNO:2; 

(b) a nucleotide sequence &at encodes of an allelic variant of an amino add 
sequence shown in SEQ ID NO:2, wheirdn said nucleotide sequence hybridizes under stringent 
conditions to flie opposite strand of a nucldc add molecule shown in SEQ ID NOS: 1 or 3; 

(c) a nucleotide sequence that encodes an oxtholog of an amino add sequence 
shown in SEQ ID NO:2, herein said nucleotide sequmce hybridizes under stringent conditions to 
the opposite strand of a nucleic add molecule shown in SEQ ID NOS: 1 or 3; 

(d) a nudeotide sequence that encodes a fiagment of an amino acid sequence 
shown in SEQ roNO:2,i«^ereinsdd fragment comprises at least 10 contiguous amino and 

(e) a nucleotide sequence that is the complament of a nucleotide sequence of 

(am. 



6, A gene dap conq>rising a nucldc add molecule of daim 5. 



7. A transgenic non-fauman animal con^rising a nucldc add molecule of claim 5. 
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8. A nucleic acid vector conqjiisiDg a nucleic acid molecule of claim 5. 

9. A host cell containing the vector of claim 8. 

10. A method for producing any of the peptides of claim 1 comprising introducing a 
nucleotide sequence encoding any of the amino acid sequences in (a)-(d) into a host cell, and 
culturing the host cell under conditions in which the pqjtides are expressed fiom tiie nucleotide 
sequence. 

11. A method for producing any of the peptides of claim 2 conq^nsing introducing a 
nucleotide sequence encoding any of the amino acid sequences in (aHd) into a host cell, and 
cultuiing tiie host cell under conditions in \^ch Ihe pq>tides are expressed fiom tiie nucleotide 
sequence. 

12. A melhod for detecting the presence of any of the pq^tides of claim 2 in a sample, 
said method comprising contacting said sample with a detection agent that specifically allows 
detection of the presence of the p^tide in the sample and then detecting the presence of the peptide. 

13. A method for detecting tiie presence of a nucleic add molecule of claim 5 in a 
sample, said method con^jrising contacting the sanq>le wx& an oligonucleotide that hybridizes to 
said nucleic add molecule under stringent conditions and detennining wheth^ the oligonucleotide 
binds to said nucldc add molecule in the san[q>le. 

14. A method for identifying a modulator of a peptide of claim 2, said method 
conqirising contacting said pqptide with an agent and detomining if said agent has modulated the 
function or activity of said pq>tide. 

15. The me&od of claim 14, wherein said agent is administered to a host cell conqnising 
an e^qiression vector that expresses said peptide. 
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16. A method ftjr idendfying an agent fliat binds to any of the peptides of claim 2, said 
method couqinsmg contacting the peptide with an agent and assaying the contacted mixtme to 
determine whether a con^lex is formed with the agent bound to the peptide. 

17. A phamiaceutica] composition conqnising an agent identified by the method of 
claim 16 and a phatmaceutically acc^table canier therefor. 

18. A method for treating a disease or condition mediated by a hmnan protease protein, 
said method conqirising administermg to a patient a phamiaceutically effective amount of an agent 
identified by the method of claim 16. 

19. A method for identi^ring a modulator oftiie expression ofa peptide ofclaim 2, said 
method coxtqmsing contacting a cell expressing said p^de with an agent, and detennining if said 
agent has modulated the expression of said peptide. 

20. An isolated human protease peptide having an amino acid sequence that shares at 
least 70% homology with an amino add sequence shown in SEQ ID NO :2. 

21. A peptide according to claim 20 that shares at least 90 percent homology with an 
amino acid sequence shown in SEQ ID NO:2. 

22. An isolated nucleic add molecule encoding a human protease peptide, said nucleic 
add molecule shanng at least 80 percmt homology with a nuddc arid molecule shown in SEQ ID 
NOS:lQr3. 

23 . A nucldc add molecule according to claim 22 that shares at least 90 percent 
homology witii a micleic add molecule shown in SEQ ID NOS:l or 3. 
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1 CX3CCCTTATG CTQAAGCCAT GGATQATTGC CCSTTCTCATT GTGrTGTCCC 
51 IXSACAGTOGT GGCAGTGACC ATAGGTCTCC TGGTTCACTT CCTAGTATTT 
101 GACCAAAAAA AGOAGTACTA TCATG6CTCC TTTAAAATTT TAGATCCACA 
151 AATCAATTTC AATTrCXSaAC AAAGCAACAC ATATCAACTT AAGGACTTAC 
201 OAQAGACGAC C3QAAAATTT6 GTGGATGAGA TATTTATAGA TTCAGCCTGG 
251 AAGAAAAATT ATATCAAGAA CCAAGTAGTC AGACTGACTC CAGAGGAAGA 
301 TGGTGTGAAA GTAGATGTCA TTATGGTGTT CCAGTTCCCC TCTACTOAAC 
351 AAA6GGCAGT AAGAGAGAAG AAAATCCAAA GCATCTTAAA TCAGAAGATA 
401 AGGAATTTAA GAGCCTTGOC AATAAATGCC TCATCAGTTC AAGTTAATGC 
451 AATQAGCTCA TCAACAGGGG AGTTAACTGT CCAAGCAAGT TGTGGTAAAC 
501 GAGTTGTTCC ATTAAACGTC AACAGAATAG CATCTGGAGT CATTGCACCC 
551 AAGGOSGCCT GGCCTTGGCA AGCTTCCCTT CAGTATQATA ACATCCATCA 
601 GTGTGGQGCC ACCTTOATTA GTAACACAT6 GCTTGTCACT GCAGCACACT 
651 GCTTCCAGAA GTAXAAAAAT CCACATCAAT QGACTGTTAG TTTTQGAACA 
701 AAAATCAACC CTCCCTTAAT GAAAAGAAAT GTCAGAAGAT TTATTATCCA 
751 TGAGAAGTAC CX3CTCTGCAG CAAGAGAGIA CX3ACATTGCT GTTGTOCAGG 
801 TCTCTTCCAG AQTCAOCTTT TOGGATGACA TACGCOGGAT TTQTTTGCCA 
851 GAAGCCfTCTG CATCCTTCCA ACCAAATTTG ACTGTCCACA TCACAGGATT 
901 TGGAGCACTT TACTATGGTG GGGAATCCCA AAATGATCTC CGAGAAGCCA 
951 GAGTGAAAAT CATAAGTGAC GATGTCTGCA A6CAACCACA 6GTGTATGGC 
1001 AATGAXATAA AACCTOGAAT 6TTCTGTGCC GGATATATGG AAGGAATTTA 
1051 TGATGCCTGC A666GTGATT CTG66GGACC TTTA6TCACA AGGGATCTGA 
1101 AAGATAOGTG OfTKrCTCATT GGAATTGTAA GCTG6GGAGA TAACTGTGGT 
1151 CAAAAGGACA AGCCTQGAGT CXACACACAA GTGACmATT ACCGAAACTG 
1201 GATTGCTTCA AAAACAGGCA TCXAA <SeQ ID NO:l) 



FBATORBS: 

S'UTR: - 1-7 
Start Codon: 6 
Stop Codoa: 1223 
3'OTR: 1226 



Bomologoius protoinat 

Score 

gi 1 7661558 1 ref | MP_054777 . 1 1 DESCTL protein [Homo sapiens] >gi 1 61 . . . 371 
gi|4758508 |ref |NP_004253 .1 j airway trypsin-like protease iHomo ... 349 
91(6467958 jgb|AAF13253,l|AF198087_l (AF198087) adrenal secretor... 277 

BLAST to dbKST: 

Score B 

gi 1 1679749 /datasetadbest /taxQna9606 ... 190 3e-46 

BXPSBSSXC30I INFOKKATIGOi FOR MQDDLAXOKZ' USBs 
library source: 

Bxpressiop information from BIAST dbSST hit: 
Primary cancers 

BxpreBBion inforroation from PCR-based tissue screening panels: 

Human Testis 

Human placenta 

Euinan fetal l\mg 

Human fetal kidney 

Human fetal heart 

Human fetal brain 

Human bone marrow 



B 

e-102 
3e-95 
le-73 
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1 KI«KPHHIAV1/ rVIjSIiTVVAV TIOLIiVHPTiV FDQKKEYYHG SFKIIiDPQlN 
51 PNFGQSNTYQ LKDIiRETTEN LVDEIFIDSA WKKNYIKNQV VRLTPEEDGV 
101 KVDVIMVPQP PSTBQRAVRE KKIQSIIjKQK IHKUIALPIN ASSVQVNRMS 
151 SST6ELTVQA SCX3KRVVPLN VNRXAS6VIA PKAAWPWQAS LQYDNIHQCG 
201 ATIjISNTWLV TAAHCFOKYK NPHQWTVSPG TKINPPIiMKR NVRRFIXHSK 
251 YRSAAREYDI AWQVSSRVT PSDDIBRICL PEASASFQPN LTVHITGPGA 
301 LYYGGESQND IJIEARVKIIS DtDVCRQPQfVy GNDIKPCMPC AGYMBGIYDA 
351 CRGDSGGPLV TRDtaa3TWYl* IGXVSWGDHC GQKDKPGVYT QVTYYRNWIA 
401 SKTSI (SEQ ID NO:2) 

FBATDRBSs 

Functional domajLns and Key regions x 
proslte results s 

[1] PDOCOOOOl PSOOOOl ASNj3LYC»SYIiATIOH 
N-glycosylatlon site 

Nuxsber ot matches: 2 

1 140-143 NASS 

2 290-293 NIiTV 

[23 PDOC00005 PS00005 PKC_PHOSPHO_SITB 
Protein )u.na8e C phosphorylation site 



Number of matches: 2 

1 41-43 SFK 

2 26€-268 SSR 



[31 PDOC00006 PS00006 CaC2_PHOSPHp_SITE 
Casein kinase II phosphorylation site 



Ntimbter o£ matches: 5 



1 94-97 TPEB 

2 152-155 STGB 

3 270-273 TPSD 

4 307-310 SQND 

5 375-378 SHGD 



14] PDOC00007 PS00007 T3fR_PH0SPHq_SITE 
Tyrosine kinase phosphorylation site 



362-369 RDLKmWV 



[5] POOC00008 PS00008 MyRISTYZi 
N-myrlstoylatlon site 



Number of matches: 3 




l€} PDOC00009 PS00009 AMIDATJOU 
Amldatlon site 



162-165 CSKR 
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[7J PDOC00016 PS00016 R6D 
Cell attachment eeguence 

352-354 RC3D 

[83 FDOC00124 PS00134 TRYPSXN_HXS 

Serine proteaeee, trypsin family, histidine active site 

210-215 VTAAHC 
[93 PDOC00124 PS00135 TRYPSIN^SER 

Serine proteases, trypsin family, serine active site 

349-360 DACRGDSGGPIiV 
Membrane spariTiinq structure and domains; 



Helix Begin 


End 


Score 


Certainty 


1 11 


31 


2.281 


Certain 


2 203 


223 


1.014 


Certain 


3 291 


311 


0.791 


Putative 
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Bla&ST Alignment to Top Hit: 
JU^ignroent to top blast hxt: 

>gi|766155B|ref |NP_054777.l| DESCl protein CHomo sapiens] 

>gi j 6137097 |gb| AAF04328 .1 1 AF064819_1 {AF064819) serine 
protease DBSCl [Homo sapiens) 
Length = 422 

Score B 371 bits (943) , Eaqpect » e-102 

Identities « 176/403 (43%) , Positives = 267/403 (65%) , Gaps = 4/403 (0%) 
Frame » +2 

Query: 14 KPW^a:AVIlrVLSI*TVVAVTIGLLVHFIlVFDQKK^ 190 

-l-PW-l-I ++I +SL V+AV IGL VH++ ++QKK Y Y+ + ++ FG+ + 

Sbjct: 16 EPW\rtCSLVIFISl*IVI»AVCIGLTVHyVRYNQKKrYNyYSTliSFTTDKI*YABFGRKASN^ 75 

Query: 191 KDIJlETTKrmVDBIPIDSAWKKNyiKKQVVRIiTPEETO^^ 370 

++ + B++V P S 4-v -t-i-K-vOV-*"*- + ++ GV +F STE +K 

Sbjct: 76 TEMSQRIiESMVKNAFYKSPIiElEBFVKSCTVlKFSQQKHGVIiAHM^ 135 

Query: 371 KXQSXUfQKXIUTLRAItP-INASSVQVlIAMSSSTGEIiTVQASaS-KRVVPI^^ 541 

+Q -•■Ir*-+K+++ P ++ SV++ 4-^+^+ CG+R It-l-RXG 
Sbjct: 136 XVQIiVItSBKIi(}DAVGPPKVmPRSVKlKRZNKTETDSYIiNHC^^ 195 

Query: 542 XAPB3U^HFWOASI^QYDNIHQOGATLISNTWLWAAHCFOKYKKPHQ(m 721 

+ WPWQASIiQ+D H+CX3RTLI+ TWIiV+AAHCP YKNP +WT SPG I P M 
Sbjct: 196 EVEB6Ef7Pf70ASIiQWDGSHRCX3ATIiIimTiri>VSAAHCFT^^ 255 

Query: 722 KRNVRI^IXHBian{£AARByDIAVVQVSSRVTFSDDIilRI(^ 901 

KR +RR l-l-HBKY-i- + •fYDI'l-+ ++SS V +++ + R+CLP-^AS FQP + +TGF 
Sbjct: 256 KRGIjRRI IVHKKY KHPSHDYDISIAELSSPVPYTNAVHRVCLPDAS YEFQP<33VMFVTGF 315 

Query: 902 GALYYGGBSQNDItRBARVICXISDDVOCQPQVYGNDXKPGMFCAGyMEGXYDAC^ 1081 

GAIi G SQK IiR-l-AW <l>X C +PQ Y -4- X P M CA6 <l-E6 DAC-i-GDSGGP 
Sbjct: 316 6ALKinX?YSQNHI^QAQVTZiinATTCNBP0AYNZ>AXT^^ 375 

Query; 1082 LVTRDLKOTWYiaGXVSWGDWOGQKDKPGVYTQV^^ 1222 

IjV+ D +D WYI> GXVSWGD C + +KPGVYT+VT R+WX SKTGX 
Sbjct r 376 LVSSDARDXWYXJU3XVSHGDBCAKPtnCPG\nriTlVT^^ 422 (SEQ XD NO:4) 

Tftmiiftr saardi results (Pfaai) x 

Scores for sequence family classification (score includes all domains) : 

Model ' Description . Score E-value N 

PF00089 Trypsin 274.8 1.9e-86 1 

Parsed for domains: 

Model Domain seg-f seq-t hmm-f hmm-t score B^value 

PF00089 1/1 174 399 1 259 £] 274.8 1.9e-86 
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1 TTATATTCAT AAAAGTAGGC ACTAAGTTGA AGATTTATTC ATATAGGATT 
51 TAGTAGCTGC AGCTTTAACC TGTGfGCTTCT GTA6CTTTTC TAATCTGGCA 
101 GTGCSOCATCT GCTATATTAT CTAAATGTTT CCTCAAAAGG AGAAACACTC 
151 TAACAACTTA TCACCCTAGT CTGCTGGCCA CCATTTTCCC TCAGATGCTC 
201 ACAGCTTCTT CCX3TGGGATT TGAAGATAT6 ACTTCCATGA CACTTGATCA 
251 6TATCTCAAT GGGTATTGAA CCACTCTTCA GCTCTQATCC CACGGTTCAG 
301 TTCCTTTCAG TGTGACTATG TGTCTTGGTG GTGGOAQATG TGATTCTTTT 
351 ATCTACTTTC TCCATTTATC TTACTCAGAG GAACTGTGCT CTAATAGGGA 
401 AAmSATTGA AAGCTTATAA ATTTCCTTGA GTTTTAACTT TTCTCCmG 

451 i^rci ' vrrrr ' v cttttcaaat gacttgaaga cacattgata agaitctatg 

501 AGAAAATGAA GAGTTGAACA AATTGAATAT GTATQAGTGA ATQAATAGAT 
551 TAATACATAA ATCSATAAATT TATTAAATAA TTTGAACQAA ATCAATCGAG 
601 AGGCACCGAG AATAAATTTG TGTCCTAGAA GTAAGAAGAC CTGAGTTTGA 
651 GATAACTAGT AGTTCTATTA TACTGGAGAA ATTACTTAAT CATCACTGGA 
701 CTTCATTTTT CTCAXATGGA ARGTAATTCA ATCACACTAA ACAATCTTTA 
751 AGGTCTCCTT CACTTATAAA TOTATtSTTTT AAGCCATTTA G6A66TTAAA 
801 TAATGTCATG TCCCATGGGA CTTCTOTTTO TTGTTCTATT CAAGCATGTT 
851 AGCTTGTTTC TATCACAGGA CCTGCTGCCT TTCCGCAGCC AGTTCTCTAG 
901 ATTATTTTTA ATCAGTCGGT GCACACATGG TCAATATTTA CTCAATAGAA 
951 TTCAGOTTTC CCAAATTCCA T6A6QATTCT TGATTAATTT TATTACTTAT 
1001 GCCAAAACTA TTATCTTCTT AACTATTTTA OGTCCAAACA omTIMCTT 
lOSl TTATCCTGGC ATTTATAXAT AAAAAACTTT 1X?TAAGACQ(3 GGTGGAGTGG 
1101 CTCATGCCTG TAATCCOIGC ACTTTOOGAG GCOGAGGTG6 GTGGATCACC 
llSl AGGTCAOGAG ATGGAGACCA TCCTGGCTAA CACCATGAAA CCCTGTTTCT 
1201 ACTAAAAATA CAAAAAATTA GCGGGGOGTG QTGGTGOACG CCTrTAGTCC 
1251 CAGCTATTCA 6GAG6CTGAG 6CAGGAGAAT GGOGTGAACC TGGOAGGCAG 
1301 AGCTTGCAGT GAGCAGAGAX CACACCACTG CACTCCAGCC TGGCAOCCTG 
1351 GATGACACAG OGAGACTCCG TCTCAAAAAA AAAAAAAAAA AAAAGAAAAA 
1401 AACTQTTTTA TAGTCAAAAG AAAAACTTTC TATAAATCAA CCAATCCTGT 
1451 GAAGAAAATA TGAAAAATAT CCTCTGTTTC CAAAAAAATT TA6GCTATCA 
1501 ATATATAC7VC ATAAAGAGAT AAACTCTOAT AAATTGGATA AATAAAATTC 
1551 ACTATAATAG CAAGTTTTAG AGAACAAGCA CGGGAGTTAG TCGACCTGGG 
1601 CCCTT»AACA OATATCCTCT CTCTCATCCT GTGTTATTTC CTGTGTAATG 
1651 TTGGTATCAT TCCTGCCTGA CTCTCMAGA TTTATATGAT TCCTACTCXO 
1701 TCCAGGTGCC TTATTGGGTC TTAGOGGXAA AAAGATGAAC AAGGCTAATG 
1751 CAOCCXMTG AGAAGCTATC TGTAAGTGAA CATACATGCA AACTAATACT 
1801 TGATTCAAOXS TGAGAAGCAC T6TTGCTGAT CATAGGTGCC AGAA6AACAG 
1851 CAAAGAGTXA TTTTTTCXTTC CAAAATTGTG GAAAAATTTT TATCCCC3GGT 
1901 GTGATGCAAT AZAAAATACA CAGCACCACC TTTGAAGTAT TCTT6CCAAA 
1951 TGAATTTAAC CAAAATCTAA TCAAGACTTC AGAGCTAAAG AAAATCTAAA 

2001 ggtaatcx::aa tttataggaa atoagggaxa taaaagaaca agttaaaxaa 

2051 TACCACAGGA AAGCATTCAG ACAAGTCXIAG AAAGTAAGAT ATTCTAAAGG 
2101 ATGrrXAOCT TGATCTCTTC AACAGTC3VAT GTCA3TAAAA ACTAAAAAAG 
2151 AAGCAG6ACT CTTTTAGATT AAAAGAGATT AAAAAGGCAT AACAAACAAG 
2201 TGCACTGCAT GGTCCTOGAT TATGTCTTGG CTTTTACSiAA TCATOTGTAA 
2251 TTAXAATOAA AOGATGOAGG GAACTT6AAO ATGGACTGOG lATZAGATGA 
2301 XATGGCAGAA ATAS^TTAA TTTTTTAGQA GTOTTAAGAO TATCATGGTT 
2351 ATGTTGGAXA TATCCTAATT GTCTATAATA A3X3ATTTGC3r AAAAAGTCAC 
2401 GATCTTTXAT TTCACATTAA AATATAGCAG CAGAAAAAAT AAATOAGCCA 
2451 AAXACAGTAA AATTTTCAAC AATTGATATA AlAAIGTGAT AlATATATGG 
2501 ASOTTCAATT ATACTATTCT TAGTAATTTT TTATOTCTGA ACATTTTCAT 
2551 AATACTTAAA AATAAAAGAT AAAAGATAAA AATAAATGAG ATAATAGATT 
2601 TAAAATCACT TTGTAAACTC TAAAAGGATA GACAOATAAA AGAGATAACA 
2651 AA G TG C TGGA GAAAGGAGGA ATGGTCCCTT TTCAAGCATG TATGCCACCT 
2701 TGGACCATGC TGCTAAGAOA AACCATTCCT GACCACXrACA AAGAOGCCAC 
2751 CAAATGCCTC TAAAATAGAA AGCAOOAGCA ACATTAGGAT TCCCAGATCC 
2801 TGATATTTTT TTTTTAACAC ATCTTCTCAO ACCAAGATGA CATTGAACAA 
2851 AATTAAAOAC CTTTTTGCAG GGAAAGGTAG GCTACAGCAA CrrQAACTTG 
2901 TCZAAOaAGA GCTGGAAAAC CTGCAAOCAT TGCTATCTGA QAOTAACCAG 
2951 TOGGCCCTTC CTTTTCTCAG aACAGTGOaA TTT6GCACCC GAAGCAGAAA 
3001 TGCTQAAGCC ATGGATOATT GCOSTTCTCA TTGTOTXCTC CCTGACAGTG 
3051 GT6GCAGTGA CCATAGGTCT CCTGGTTCAC TTCCTAOTAT TTGGTAGGTA 
3101 AAATTAAAGA TTTCACTCTA TTTQA17TTA TTTTTCTGCA AAGCTCCATT 
3151 TACATATATO 7AAATGTAAC TTCATCTAAA AAATT6CACA TTTACCTTCA 
3201 AATTTCCACA QAQTMATTT AACTGTTTGA GTCATTTCAT CAAC3VAACAA 
3251 QTACTAAATT CTTATTATAT GTQAGTACIT TTCTGGAXAT TCAAQATACA 

FIGURES 
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3301 GCTTTAAGCA AAOTAGACAG ATTTCTAATT TCCTTAGAGC TCTCAACCCA 
3351 GAArrCTTTT GAGJUITCTAC ACAAAAAGAT CAAAAATTGT AATTGTCTGA 
3401 AACTTACTA6 TAATTATAAT AAACAACTCA TCACTTATTA TATATTAAAA 
3451 TGAAAAGCTA TGATAAATTA GTTATTAAAA TTGGCTCTTT TACTCATOAA 
3501 CCATCATTTT CTGTCCAACA TTTCTAAGGC AAAAGAAAAA CACTTCTCTA 
3551 ATAAAATAAG GAATTTCAAA ATGATTGAAA ACCTATACGT ATGACACAAT 
3601 ATTATCATTT ATTTTTAGA6 AAAAAAAATT TTACTCTTTC CAAAACAATA 
3651 TTCAGGGATT ATATTTTTAT CAACTAATAT ATTTGTAATT ACACAAATAA 
3701 TGCACTTCAA GATTCTCTTT TTACATTCAG TCTCTTTCTG GGOAGAATGC 
3751 AAGCCATTTA CATTTTTTCA CAAATCTCTA CAATGTGACT CTCACATGGA 
3801 TGTATGTGAT AAAACAAATA ACTCAGGCTG CTCACTTTAA OGCTCTTATC 
3851 TGCTGTCACC TTCACAGAGT CAATGGGGQA GCAAAGACTC TACTTGGAGC 
3901 CTTAAA6G6C TTAAQATCAT AGTCCTAGGC CTTATATGAT AACCCCAGCT 
3951 GTAGTTTATA CCATTG6CAA AAGATTCTCA GGTCACTTTA TTTGGTTGCA 
4001 TAAAAGTCTC TTTACAATGA GAGTAAGGTT TGTTAACAGT ATGGATTATA 
4051 TGGGTAAGTA ATCAGGATGT CCAAAAATGT ATTACAAGGT CCAGAGATTT 
4101 CCCACTTAAa ACATAT6CCT TCCTGATATC CCTGTTTCTT TCCTTQC3TTT 
4151 GTA6TCTOGA AACCCaCTCC CTCTTCCCTG AGCCAGGCTT CTCAAGGATT 
4201 GAGCTTGTTT TGTATTTTTC CCATTCTCTA TCTTTAACTC TGTATCTTTC 
4251 TTACTCCCrC TGGGCCTTAC TCCTCAGATT ACCAAATTCC TTAGQAGTCT 
4301 CAACTGCTTT CCTTTCTTAC ATTTCCTAAT AQATTTATCC CTGTTTCATG 
4351 CTCGTCTTGT CTTCAATCTC AGACAGCTCT TCTCTACACT TTCTTTTCAG 
4401 GTTTTTCTTA GTGTGCCTGG CTCTCTTGTT AAAAATCAAA ATTCACAAGG 
4451 ACAITCACTT ATCTCTACTT CCACTAGAGT GTATGATGGT ACACATTTCA 
4501 ACTCA6CAAG GAGCAATGTA GCAATGAAAT GTTCAAGCTC TACAGCTAGEA 
4551 CTGGATTTAA AACTTGGACA C3GCCACCTAC TAGTTACAGA ACAATTTACT 
4601 TAATOCCTCT GTGCCTTAAT TTCCTTATCT GTAAAATSAA GGXGATACCA 
4651 ATCTTASAGA GCTGGTGTG6 GGATTAAATG GGCTAATACA TAAAAAGTGC 
4701 ACAGGACA6T GCCTGCCATA TTGTAGAAAC TCAAXAAAT6 GCAGCTATTA 
4751 TAATTGAXAT AAAACATTAA CTGTTATTTT TTAAATAAAA CTCAATTATG 
4801 AAOAGGCTCA GGGACATATT CAAGATTTAT- ATTGGCCXXa TTGTAATTQA 
4851 GTTCTGAAAT CTTTGTCCAA ACCATTTAGT TTCCTATTTT TCATTTCCAT 
4901 TGCAGACCAA AAAAAGGAGT ACTATCATGO CTCCTTTAAA ATTTTAGATC 
4951 CACAAATCAA TAACAATTTC GGACAAAGCSk ACACATATCA ACTTAAGGAC 
5001 TCAOQAGAGA CX3ACCGAAAA TTTGGTGAGT CAGGTAAACT TCTTTTTATC 
5051 ATA6AATAAT GCAAGTGGAA QGGATTTTGT GGATCATTTC TCCATTTCTA 
5101 AAAACATGAT TTTCAGACCG CCAACATTAG AATCATCTTG CAGATTGCTA 
5151 GGOCCCATCC CAGACCTGCT TAATCA6AGT ATGATGAGAT GGGTAGGTGG 
5201 GGAQAGQAaA GTAAGGGAAT CTGCATGTCT AACAAATGGG TGATTCTAAT 
5251 AAGCCTCTCT TTCTAACTCA GCTACCTTAT TTAAAGGTAA GAGAATTGAG 
5301 GCCAAGAXAT CCTAGCCOGT TTCTTCCCCA ATTCCACCAC GTTTCCCCTG 
5351 TAGAAAAGCC TAATCATACC AAAACTAGTT TTTATAAGTC CACACACTTG 
5401 TTTGZAAGAC CACATTTTAA OATTTTGAGT ATTTTCAGAA TTTAC3GTTCA 
5451 TCTTGTAAGT ATATTSATAA AGACAAAAAA CCAGACTTAT TTTGZAGTAA 
5501 TCAAGTCAAA TGCXAATAAT TTTOTTAAAG CTAAAGTGCA AGACTGCTCC 
5551 CAAAAAGAAA AAAAGCACAC TCAGrTTGTAT AATCATTCCA CTCAGAA7GC 
5601 CCATGAACTC TCACTCAAAA ACTAGGTTCA AATTAATTTT TCTAACAAGG 
5651 AAGCTkCAGAA GCAGAGACTT ATTTTAAAAA GAAAGAAATG ACAAATGTAT 
5701 TGQFTTTOTTT TAATCAAAOA ACCATTTTTA AGACACTTTC TTTCCCAAAT 
5751 CATCIAOCAT TTTTTCCTGT CATCATTTGC TCTTTGTCCA TAGTATACCT 
5801 A AIGG CATCA TATTT/yCAAT AATATTGTAG AGTTIATAAT CTCTATTTTC 
5851 AGTXAACATT AAATCATTCA CAATTTCTTA ATTTTGTGGT TTCATCTTTC 
5901 CX3ACCAATA ATTAATGTCT ACAGATTGAT ATAGATTCTG CATTCTTTCA 
5951 CATGCAGAGC ATCTTATAAA AGAGCATTTG CAATCAGTTC TTAAGTTATG 
6001 CTAGGATGAA OGGGGAGCCT GCACCAATAC ACCCAAATAC CTTCTCTACT 
6051 OCTOC3M3TCC TRAGTGACTC CACATAACCT CCTCGATGCA AAAAGAGAAA 
6101 ACTCTTAACT T6CCTT7W3TT AAAAAGATAA ACACACCTTT GAATGAXGQA 
6151 AAATGTTACA ATTTACTGGG AAATTTTQAA ATTTGTTTCA TTTATATTTT 
6201 ATGQCXAACA TTACTOCTAC TGTTGTTGTT GTAAGTTAAC TAG6CAATTC 
6251 TGTCTTTACr GAAGTAAACG GACAAGAAT6 CAATAGGTCT TAAAAGAAGT 
6301 GAGAGAAATO CAGAGGT6CA TGTTGAACAG AAACTCTATT TAAAAGTGGA 
6351 GrTTTAAGTT TCACCTAAGC ATGTOTTCCT TCAAAGGCTA AGGCTAAGTT 
6401 AAGTAAOGAC ACATTATCM CATGGGTACC TGCAAGGCCC T ' i ' CX ' CTUtfrr 
^451 GTCATTA1TT ATTTATCCTC CTTTATCACC ATAGCATAAG CCCTTACCCT 
6501 CCCCXXTTTGC AOOAAATCAT TCTATGTTTC ATGTGGTATT CrTTTGTTTG 
6551 TATTCATTCT TACAAAAATA TGTTTT6CTA TTTTQOGTAC ACTTGCTTTT 
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6601 AACTTACATT TTC3TGTTATA AATCACTTTT GTTTCATCTC TTTTTACTGA 
6651 GAACTTTTTA AAAGATATAT GTTACIAAAT ATACCTTTAG TTTATTGCTG 
6701 TTAGCTGCTA ATTCATAGTG TGTATCTTCC ATATTTACCT GCCTGTCATG 
6751 CCAAGAAATG CCACACTAAA CAGACTCCTA CTTACCCCCT TATAGACCTA 
6801 TGCAAGTACT TCTGGAAGCA GAATTACTAG GTCATTGAAT GTACATATAC 
6851 TTAACTTGAC CAATTGGTGC AGGTTTGCTC TTCAAAATQG CTGACTCAGT 
6901 GTGCAOGCCC ATCTACAATG CATGAGGATT TCTATGTCX:C CACATCTAAC 
6951 CAACACTTAG TGTCTTAGTA TGTTTAGGCT ACTACAACAA AAAATACCAT 
7001 AGGCTGGGTA TCTTAAACAA CAAACAATTA TTTCTCATAG TTCTGGAGGC 
7051 TGAAGATTCC AA6ATQAAGA TGATCAAGGC TCTAGCAGAT 6TCTGGTGAG 
7101 AGCCTGCTTC CTGGTTCATA GAATACCATC TT6CTGTGTC CCTCATOGCA 
7151 GAAGCCATAA GAGAACTTTC TTTTGTAAGG ACACTAATGA CTTTCATGAG 
7201 AACTCCACCC TCATGACCTA ACTATCCTCC AAAGGCCCCA TCTCCTCTAT 
7251 CATCX3GTTTG GGAGTTAAQG TCTCAAAATA TAAATTTCAG GGGAACACAA 
7301 ACATTCASTC CACASCACTT 6GTATTATTT GGCTTTCTAA ATTTGCCACC 
7351 CTAATATGTA TAAAGTAGTA TTTTATTTGT GATTTAATTT GCATGTTTCT 
7401 AATTACTAAT GAGTTTGTGC ATTGTTACGT ATAATTATTA ACTTTTTGGA 
7451 CTTTCATTTC TATAAATTGC CTGTACATAT TATTTGCCTA TTTTTCTGTT 
7501 AAACTTGCTT TTTCACCTTA TTTGTATTGC TTTGCAQAA6 TTCTTTACAT 
7551 TTTCTGGATA TTGATAGTGT GTTGGTTGTG GACACTGCGC TTATCCATTC 
7601 TGTCTTCTAC TAATATGGAC CGTGTTGTTC TTTATGAAAC CGAAATCTGT 
7651 AACTGAAGTA ATCATTTTTT CACTGTTTTG CXmATGATT GTATTTTGAA 
7701 GCTTTTCTTT AAGAAQTCCT TCTTCCCTTC TAAQACATAA AAATATTTTA 
7751 CTATOTTACT TATTAACCTT ATAGTTTTAT CTTTTACATT AGGTCTCC3A 
7801 TACATGTGQA ATCCACCTTT GGATGTGTTA GGTAGATTCA GTTrTTTTAAT 
7851 TCATATAGTG AGCC3U3TTTT TGAATATAAC TAGTTAAAAT ATCTTGGCTT 
7901 TTCCTAATAT ATGGIATTAT TATTGAGTTC ATTGCATGCA TTTCTTOGCA 
7951 CCTGGGTCTT GCAGAAAAGG AAACATGAAT CTGTCTCCTC AAATTGCTTC 
8001 CAATCTTTTT GGAAAGATGT GAGTAACACA CATGGAATT6 AATATC31TGA 
8051 CATGATATAA TTAAGGGCTA AATTACATGT TGAGGACAGT AAQTACAGAA 
8101 AAACTTCAAA ACCAAACAAG GGTTCXXIATG GTCAGAAAAG GCTTTATATT 
81S1 ATTTTACCTT TGTTTAAATG AGACAGGTGT TTTTCnCCrC CCATCCCGCA 
8201 CCAGGTTAGC TTTAGAAGAA TTACAGGAA6 ACTTTATGCC TCATCCTGAG 
8251 CCACACCTGT TTGTTGrTGC TAAATCCCAA TGAATACAAC CAGATTCTTC 
8301 TCTCTGTCCT ATATGGGTGC TAATTAGACA ACCAAGGAAG AACAGGTTGC 
8351 AOGTCCTGTT CTTCCTCACA TTGGGCTTTA CTGATTTGAA TGCAAATTGA 
8401 GATGCAAAAG TAAAAATGAG TTCATATTTA GATATTGCTA TAATCCGCCC 
8451 CTGTTCCCTG AGATAGTGGA GCAGACATAT CTCATCTCTC ATATCATTCT 
8501 TCAGAGAAGG GTCCATTAAT CAGACATTAC TGATGTCTGA TTACTGCCGG 
8551 CTGGCCATCC TGCAGGTGGA GAAGCATOGC ATCCAGCAGA AACT6ACAGC 
8601 ATGCACTTTG AGGGAGGGAA GGATAAGCCA GGAATTTATG CTGAATAAGC 
8651 TQCCTAA0TA TACATGTTCA ATAAGTTCTA GGGGAA6TCA CAAATACTTA 
8701 TGAAT^GGAGA AACAIAACTA TGTGCAATTG AGCTTTATGT CTCTTCATGT 
8751 GTTGCAT6TT CAAAAAATG6 T6GCATTA6C ATGATCCAAG GGTGGAGTTT 
8801 TCAGCCATTT GATGTTCAAA GGTGAA6CAG AGGACACAAA ACCCTTACTA 
8851 TGCATCCTCT GTGAGTCAGC CAAAACCAGT CTGGACT6CT AGCTAGATTA 
8901 ACAAAQAAAA AAAOAGAAAO AAGATACAAA TAAGCAOGAT CAGAAATOAT 
8951 AGAGGTAACA TTACAACCAA TCXCACAGAA ATACAAAAGA TOGTCTGAGA 
9001 CTCTTATGAA CACTTCTATG TAGATAAACT AGAAAATCTA GAOGAAATGG 
9051 GTAAATTCCT 6GAAAAACAC AATCTTCCAA 6ATTGAATCA GAAAGAAATT 
. 9101 GAAACCCTQA ACAGACCAAT ATTGA6TTCA TACTTAAATC AGTAATTTAA 
9151 AAAACTTACC AGCCAAAAGG AAAAAAAAA6 GCCCAAACTA OATCGATTCA 
9201 CAGCCAAATT CTACCAGAOG TACAAQAAAT AGCTAGGACC AATTCTAGTO 
9251 AAACTATTCC AAAGAATTGA GAAGAGACTT CTTCTTAAAT CATTCTATGA 
9301 AGTCAGCATT ACCCTAACX3C CAAAACCTCA CAAAGACAOA ATOAAAAAAG 
9351 T^AAATTACAG GCCAATATCC CTQATQAACA TAGATATAAA AATCCTCAAC 
9401 CAAATACC3US CAAACCAAAT CCAGCAGCAC ATCAAAAAGT TAATTTTCCA 
9451 AAATCAAGTA GOCTTTATTT CTGTGATGCA AGACTGGTTC AACATATGTA 
9501 AATCAATAAA TGC3QATTXAC CACATAAACC GAATTAAAAA CAAAAATCAT 
9551 ACAATTAGCC AGGCATGGTG GCTCACACrT GTAATCCCAG CACTTTGGGA 
9601 GACCATGGTG GGCAAATTAC CTGAGGTCAG AAGTTOGAGA CCAACCTGGC 
9651 CAACATGGTG AAACCCCATC TGTATTAAAA ATACGAAAAT TAGCCGQGCA 
9701 TGGTGGCAGG TGCCTGTAAT CCCAGCTACT CGGAGGGCTG AGGCAGGAGA 
9751 ATCACTTGAA CCCAOGAGGC AGAGGTTGCA GTGAGCCGAG ATOGTGCCAT 
9801 TGCACTCCAG CCTGGGTGAC AGAGCAAAAA TCCATCTCAA AAAAATTAAA 
9851 AATTTAAGAA AATTAAAATC ATACAATCAT CTCAATATAT GTAGAAAAAG 
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9901 
9951 
10001 
10051 
10101 
10151 
10201 
10251 
10301 
10351 
10401 
10451 
10501 
10551 
10601 
10651 
10701 
10751 
10801 
10851 
10901 
10951 
11001 
X1051 
11101 
11151 
11201 
11251 
11301 
11351 
11401 
11451 
11501 
11551 
11601 
11651 
11701 
11751 
11801 
11851 
11901 
11951 
12001 
12051 
12101 
12151 
12201 
12251 
12301 
12351 
12401 
12451 
12501 
12551 
12601 
12651 
12701 
12751 
12801 
12851 
12901 
12951 
13001 
13051 
13101 
13151 



CTTTTGATAA 

ATOGAAGAAA. 

AGCCATCATC 

CAGGOAAAAA 

ACTAGAAGTT 

TCCAAATACG 

ATTATATGCC 

ATAAACAACT 

AGTAGCATTT 

CAATCCCATT 

TACATCTAAC 

CTATTQAAAG 

CTCATGGATT 

CAATCTACA6 

ACACAAAQTT 

AAGCCCCAAT 

CTCACATTAT 

AGAATGGCAT 

GAACCCAGAA 

TTAACAAAAA 

TGGAATAGCr 

CTAAAXACAA 

ACAAACAATT 

TAGACATCAG 

GCAACAAAAA 

CTGCACAATA 

GGGAGAAAAT 

TCmZAAAGA 

ATXAAAAAGT 

ACAAOQCAGT 

CAAGTAATGC 

AATOGCTATT 

GGAAAGAGAA 

GTOOAAAGCA 

TCAACCTAGC 

TTTTTCCCAG 

CAATAOCAAA 

ACAAAOAAAC 

ATACCATSTA 

GGA!rGTAACA 

GAACCAAGCT 

GATTTQAAGT 

GTTCTACIOT 

AATCTAAATC 

TAAATCCTCT 

TATACTTATT 

TTGCTTQAGGA 

AAGCTGATGG 

ATAGTT6GAA 

AQAAAAGAA6 

TACA6CATAG 

GCAAATACAG 

TCTCTCTCTC 

TGGA6AGGAC 

CCTGGTTTTA 

TTTTCCrCTT 

TQAQAATXAA 

TCTATAAAXA 

ATGOAXATCr 

CTGTCCATTT 

T6CAATCAAA 

ATGATCTQAT 

6GCACCCTCC 

CTCCAATASA 

AAOQCCAAAA 

CTCTATCCAG 



AATTAAACAT 
CATACTTCAA 
ACACTOAATG 
GACAAOAATO 
CTAGAAAGAG 
AAAAGAGGAA 
TAGAAAACCC 
TCAGTAAAGT 
CTAAACAATA 
TTCAATAGCX5 
CAAGOAGGTA 
AAATCGGAGA 
GGAAGAATCA 
ATTCAATGCT 
AGAA7VAAGCT 
AGCCAAAG6A 
CTGACTTCAA 
TGSTCAAAAA 
ATAAAGCCAC 
TAAGCAATGG 
AGCTAGTCAG 
AAACZAACTC 
AATACAAGAA 
TCTTGGCACA 
CAAAAAXTQA 
AAAGAAACTA 
ATTTGCAAAC 
ACTTAAACAA 
AGGCAAAGAA 
CAAGAAACAT 
AAATCAAAAC 
ATTAAAGATT 
TGCTTAAATA 
CTTTGGAGAC 
AATCCTACTT 
AAAGACAGCT 
GATGTGGAAT 
TGTGAGATAT 
GCCATAAAAA 
CCACAAGGAA 
TCTGAAATTA 
CATCTAGGCA 
CAGGAAGGQA 
AATTCTTCTC 
ATGCTGTGCT 
AGAGAAAAAT 
AGGCAAAGGT 
ATATATTGAC 
GAAAGGTGT6 
TAATATGAAC 
TCTTCACAGG 
CATGATGTTA 
ATAAACIAAT 
ATQAACCTTC 
AATATTTTTA 
TTACAAAA6C 
TACTTTTTTC 
TTTTQCCTTG 
AATATATTGC 
CTTTCTQACA 
GAAATGACIG 
TACCTTTTTG 
ACACCCS^CAC 
CACAGCACTT 
AGCCTTCTGT 
GCTTGAGAGA 



CCCTTCATAA 
AATAATAAGA 
GGCAAAAGCT 
TTCACTCTCA 
CAATCGAGCA 
GTCAAATTAT 
TAAAGACTTT 
TTCAGGATAC 
ATGTCXIAAGC 
ACACACACAC 
AAAGATCTCT 
T6ACACAAAT 
ATATTGTTAA 
ATTCCTATCA 
TTTGTAAATT 
CTCCTAATAA 
ACTATACTTT 
CAGACATATA 
ACATCTACAG 
GGAGAGAACX 
AAGCAGAAAA 
AAGATGCAGT 
CCCIAGAAGA 
GAATTTAGGA 
TAAGTTGGAC 
TCAACAOAGT 
TATGCATCTG 
CTCAACAAGC 
CAT6AACAGA 
AT6AACAAAT 
TACAGTGAGA. 
AAAAAAATAA 
CTGTTGGAAA 
TTCTCAAAGT 
ACTGGGTGTA 
GCACTCTCAC 
CAACCTAGAT 
ATAITGTATAT 
AGGATGAAAT 
GGCACTTTTA 
AGGTCCATAG 
ACTCCACACA 
CTCAGCTAAO 
TCATTTCATT 
AGCXAACTTT 
ATTCTCTTTC 
AGGAGGAACr 
ATGTGTATGT 
GATGG6TATG 
TATTTCTAAA 
AGAATCTATT 
OGCACXATAA 
CCAATTTAGA 
TAAATAATGA 
GTACAGCTTT 
AATTCAAAGA 
TTTAAAATCC 
TQATCTCAGA 
TACrTGTTAC 
TTCCACAAAO 
ATGGCATCAC 
TTTAGGGTGA 
ACCTTGTATG 
AAAGAIGTGG 
GAGCATCACT 
ATGQITCATA 



TAAAi\ACACT 
GCCATCTGTG 
GGAGGCACTA 
CTACTCCTAT 
GGAGAAAGAA 
CTCTCTTTAC 
ACAAAAAGTT 
AAAATCAATG 
TGAOAACCAA 
ACAAATGAAA 
ATAAGGAGAA 
GAATGCAAAA 
AATGTCCCTA 
AACTACCAAC 
TCATATGGTA 
AAAAGAACAG 
AA6GCTACAG 
AACCAATAGA 
CCATCAGATA 
TTCTATTCAA 
ATGAAATT6G 
AAAGAATXAA 
AAACCTAGGA 
CTAAGTCCTC 
CTAATTAAAC 
AAACAAACAA 
AAAAGGTCTA 
AAAAGAAACC 
TGCTTCACAA 
GCTCCACATC 
TAATATCTCA 
CATGCTGATG 
CGTAAATGGG 
ACTTAAAAT6 
TACCCAAAGG 
ATTAATTACC 
ATCCATCAAT 
ATATCTATAT 
CATGTCCTTT 
TCTCCTCTTT 
CTGGAAAAT6 
TGTGCTCTTT 
ACAGAAGATA 
TTTTAAATCC 
TTCTTGACAG 
TCATTTCCCT 
GTAATAGAGA 
ACATCTAGTO 
CTTTTTGAGG 
TTTCCTGATA 
TAGTTTATCA 
AAG6CTAAGA 
GATTTAGAAO 
CCTTCCCTTG 
AAATAGATCC 
TCTAGGTTTT 
TTAATTGCAA 
AATATAA6CC 
ACGTGAGTAG 
AAACACTGAA 
AAAATATCAC 
7CAGAAAGTC 
ACACTGQATC 
CAGTTAG6CT 
CAGTGCTCAG 
OCTGACTTCT 



TAGACTAGGC 
ACAAACCCAC 
TCXTTTAAGAA 
TCAACATAGT 
GGAAAATGCA 
TGACAATATG 
TCCAAAACTG 
TACAAAATTC 
ATCAAGAACA 
TACCTAGGAA 
TAAAAAAACA 
ACATTCCATG 
CTGCCCAGAG 
ATAATTTTCC 
CAAAAAAAAA 
AGCCAGAGGC 
TAATCAAAAC 
ACAGAATAGA 
TTCAATAAAA 
TAAATGGT6C 
ACTCCTATCA 
ATGTAAGACC 
AATACTGTTG 
AAAAGGAACT 
TAAAGAACTT 
CCTACAGACT 
ATGTCCAGAA 
AAGTAACGCC 
AAGAAGACAT 
ACTAATTATC 
TACCAGrrTAC 
AGACTGCX3GA 
TTCAGCCACT 
OAACTACTAT 
AGTATAAACT 
ACAGTATTCA 
GGTGGATTGG 
ATACCATGGA 
GCA6CAACAT 
ACAGGTAAGA 
ATGGAGGGGA 
CCACTAAATT 
AAATTATTAA 
AT6AAGATTA 
ATACATTAGG 
OTATCAGTTT 
AAOATGAAGG 
TGAACAATCr 
GAAGTTTTTG 
AAGTT6TAAA 
TCATCATTCA 
AAAATOATTC 
ACAACAAATC 
CTTTG6GTAT 
AAATQAGATA 
TOTTOXACAC 
ATCTTTAAAT 
AATTTOGGAT 
T6ACA6ATGT 
6AAGQACCAG 
ATCCCATTTG 
ACAGTTTCAT 
CAACTGCTTT 
TGACCCCAAG 
C7TTGACTAAG 
TGGATCCAAA 
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13201 
13251 
13301 
13351 
13401 
13451 
13501 
13551 
13601 
13651 
13701 
13751 
13801 
13851 
13901 
13951 
14001 
14051 
14101 
14151 
14201 
14251 
14301 
14351 
14401 
14451 
14501 
14551 
14601 
14651 
14701 
14751 
14801 
14851 
14901 
14951 
15001 
15051 
15101 
15151 
15201 
15251 
15301 
15351 
15401 
15451 
15501 
15551 
15601 
15651 
15701 
15751 
15801 
15851 
15901 
15951 
16001 
16051 
16101 
16151 
16201 
16251 
16301 
16351 
16401 
16451 



MAAAAAAAA 
AGQT^CTGCAC 
AAGAAAATAG 
ACAAACTGCT 
CCTGCATGCT 
AA6CTACATG 
GTCACTTCCT 
GGCTGATTTG 
GCATATATGG 
TACCAAGGAG 
ACAQGAAAAA 
TCTAOGAGCA 
TTATAAAATT 
CCTTTTTATC 
ATGCAAAGGA 
TCTCTATGCA 
ATAAATATAT 
AAAGAATAAA 
CAGGTCAGTT 
TTGGCCTTGA 
QAAGGTTGTT 
AAAATGA6G6 
GCAAGTTG6T 
GAGGCTCAAA 
GQCAATOAAA 
AAATCCCTCT 
AGAAAGCAAT 
TTAATTCTAA 
ATCTCATTTA 
TTTACAGTGT 
XAAGGGGAGC 
TAAACABAGT 
TGTTTGAGTT 
CCTTCAATTG 
AAAATTATCA 
AATGCTAGGA 
GAOAGATAGG 
AACAATAGAT 
TCTGAAACTT 
TTCXAATAAT 
AGAAAGCTAG 
TGTTTTTTCT 
TTTACCAGTC 
GTATCACCTA 
TCTAGAGAGA 
GTGATCACCA 
GCTCCTATCT 
GGTTTAGACA 
AGTCAAAAAC 
AATCAAATGT 
TTAAATACCC 
TCTCCCTGTT 
CXKTACTGAG 
AACAAAGTCA 
6CAGATAAAC 
TACCAATAAA 
TCTTTGATCT 
TTATACTTTA 
CAAAAGAAAC 
OATTGAGTGG 
CCCAGCTTAC 
TGAQTCACAG 
ATTAGAGTTA 
TOAATTCCTG 
AACAGCTTTC 
GCAAGTTGGT 



AAACACCTAG 
TAAAAACTAC 
GCTGCCTTT6 
TGCPGCCCAT 
CATCCTGAGG 
GCCCTGGTTT 
CTTCTATGAA 
AAAGCAAATG 
CAGACAACAG 
GAGCTTTCCA 
AAAATCTCTT 
TTGATCTCAA 
TTCCTGGTAA 
AATTTGTTAA 
CTATGCTAGA 
TAAAGGGTTT 
AGTGAATATA 
AGGTATAAGA 
TQAGATTAAC 
AGGATAGTTG 
ACAGACCAAA 
CCATGCXTTGA 
CAGAGAGGGA 
GC5CTCAAATT 
CCTC3M3TQAA 
lATATAATAT 
AGGATCTAAA 
GGAGCTCACC 
TATTCCATAT 
TAGCCACAAA 
CAGTCAGGAA 
TAATTCAGGA 
GACAGCCTCT 
CCCTC^TGQA 
AGCAGAAAGA 
GACAAACTCC 
ATAGATTTAC 
TTATAACAAC 
CTGGGAACTT 
6AAAGCCATC 
TTTATACATA 
GATTTTCCTC 
CCTCTGCCCC 
AJUSATTCTAA 
GAGTCCTAGA 
AAGTTACTTC 
TCCCAGACAA 
TTCATAGCAG 
TTCATCCAAX 
OGTTGAACTT 
AT^TTGCTATT 
CCATTTTOTT 
TAGTAGAGCT 
CCCTTTCAAA 
CCCACACCAC 
GACATATTTT 
TAAGTTTAAG 
GGTCAATAAA 
OQAA6TAAAA 
GAAGAGGCXSG 
TAGTATCTGC 
ATTTCTAACA 
AAGAAGATAA 
AGCATGCTAA 
CCTATATTTO 
TTTTCTTCTC 



AGTTTTATAC 
CAAGATEATG 
GAGAG6GGTG 
CAGTGGGTAG 
TAGGCACTGT 
CAGTGAACTC 
ACGTGAATAA 
AGCTCAAACA 
TGATTCCCAC 
CAAATAATCA 
CCX3ATAATTC 
CCTGATGTAA 
ATGCAAAACr 
TTCAACAAAA 
TTTTATAGAC 
6CCATTTAAT 
TTGCATAAAT 
GGGATAAGAA 
GAATATCCCC 
TGATTCAGGA 
GGAACAGCCT 
AAGTACTGGT 
GAGGACCTTG 
CCCrATTTTT 
GCTTTATTTA 
CCTTTGCATG 
GTTTTTTTTC 
TGGrrGTTAOG 
GCCAGTCTGC 
ACACTCCCTA 
GCACCTGCAG 
ATGAAAGCTG 
CCCTCACrCT 
•ACTTAATCTC 
GATACTACCC 
AACXACAAAA 
AAATTGCTAA 
CTGGCACACA 
GGAATGCCAG 
ATCTGCCATG 
A6CTCCATTC 
CTGCTQTAAA 
ATTTCTCAAA 

GGCxrrccxcc 

ATmAXAAA 
TGCCTAGATT 
CCTAACAATT 
6CAOGGT6CC 
GCCTTCACCA 
ATTCAGAGGG 
TGCTlTTi'lTiTC 
TGCCITTTTA 
GGCTGTGGGT 
AACATGTCTT 
CTCAOCTAAA 
GGATACTAGC 
GCXA3TACCA 
TTACTTGATA 
TTTACATCAC 
ACCCTACAGA 
GTGATGCCAO 
GGAATQAAGA 
CAAATAATGT 
GQAAGTTATA 
TGTGTQTQTG 
ATTCCTTGAQ 



AGATAT6ATA 
ATTCTTATTT 
CAACAGTTTC 
GAGGTCTTAG 
GAAGGCGTTA 
TGTGGTGTCA 
TCATAGTACT 
CAATGACATC 
TATTATAATT 
ATTACCTAAA 
ATGTGTAATT 
AGCAAGCACT 
TTCTGATAAA 
ATATACTACA 
TATGAAAAOA 
AAAAQAGACT 
ATATAATATA 
AAATTGAGAC 
AAAGAAGGTA 
ACACAGAACT 
GAGAGGCGTG 
6GTGITGAAG 
AATGTCTAAC 
ACCTTGAGTT 
AGOCTAAAAG 
TTACTCTTGT 
ACAGCATGGT 
TTGGAAAAAA 
AGTGACAXAT 
ASTGAATACA 
AGAATU^GCAG 
AAT6GCTGGG 
TTCATTAAAT 
AGTGTAATTT 
TGAAAGAGGG 
TTCTAGAAAT 
TGCTATT7UGG 
GCTTTAAATA 
AACGTTGGCA 
GAAACAATTT 
TACAATAAAA 
TTCATTTTAT 
GCGTTGTCCT 
GATGTAGTAA 
GAAGGATOCT 
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TTTGGTTTCT 
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TCTTTTGTTC 
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TATAAGTTTC 
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CAGGGTCTTT 
CTTATGTTCA 
CAGAATTCTT 
CAGACTACCT 
ATGAGACTTT 
TTTTATTATT 
TATTTTTACA 
ATTGGTGCTT 
TOTCATCCAG 
AATGGCCAGG 
ACAAACTTCT 
GFTATCACTTC 
TCCCTCACCT 
AAATTGTTAT 
TTTGTTTCTA 
TTTATTTAAQ 
CCAAATGCTA 
TCTACAA6TT 
ATGTGTTCTA 
GTGQTTGTGT 
GOGTTTCAGT 
ATAATGCCTC 
CAQAATTGTC 
GAGGTATTTA 
TQATCCCTGA 
TTTCAGTCAT 
ATTTTGTGCG 
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16501 CACATTTGGT TCTTCTGTCC AACATGAAjCT C5TAGTACCTT ACCCACATTG 
16551 AGATGACACT ATTTCTACCA AGTGAGTGCT AGGGGATACT GCAAGCCGAA 
16601 TQCCAG6T6T OAGAOACCAC A6CATCACAA TACCGTGGCA GTAGATTAAA 
16651 GCTOTGCATA TC5GACTAAAA GCAGTGGCTT TGCTTCTCCT ACCTTGGTGA 
16701 CATAAACTGA GTAACAAATT TGACCTAATA CTGQAATACC ACCTAATTCT 
16751 TTTTTCCTCC CTGATTTACC CTAGAGTCCA CAATTGACAA TAATTTAAAA 
16801 ATTTTOGCTC TCTCTTAAAT CCCTAATX5CC TCCTCCTTAC ACCTTACAAG 
16851 CAAAQACC1X3 CAGAGCTAAO ACCTGTAATG CCAGGATGGA G6CTAGAGGA 
16901 CCATCAGCAA TTAACTACCA AAACTTACCC AACATTTTAT ATCTGTTTAA 
16951 CCTTCRTAGC CTTATGAGTA GCAGATCAAT ATCTTTGTTT TACAfiOTTAG 
17001 AAAACTQAGG CTCAAATTQA TTCAGTAACT TTGCCAAGAT TGCCCAGTTT 
17051 GOaAAAAOTA GTATACJGCTC AAATCCAGGA CTGAGGC3W3G G ' in ' T T CXViXJ 
17101 TCACCACTCA AAGCCTCTCT GAATATCCTA TCTCTGCTCT GTATCTCTCT 
17151 GCTACTCCTT CIATGGTGTT TTAGCAAGAT ATCTTCTACT CCAGAAACCT 
17201 ACTCTA6CAC AGXAGAATTA CTT6GGTAGG TTTTTTAAAA ATATGAGTGC 
17251 CTAOGTCCCC TCTAQACCAA TCGAAACCAA AATTCTTGGA GAGGATCCCT 
17301 GGCATCCATA AATTTTTTTA ATTCATCAAA TGATTCTGTT GCACTGTOAA 
17351 AGCTGAGATC CACCAATTTA AATAATGATG TTAGTTCTGT GAT^AAAATTT 
17401 TTGATTGCTT TAACATTTAA TCAAGGATAT ATTCCTATTA TAAAATATAT 
17451 TATTAACACA TAGTTTCTCT CnXJriXJlXJT AACAGGTGGA TGAGATATTT 
17501 ATAOATTCAG CCTGGAAGAA AAATTAXATC AA6AACCAAG TAGTCAGACT 
17551 GACGTAIGIA TGTTTGGGCA AAOGTOGAAT CACAAGACTG GAGGGAAAAO 
17601 GAACAAAGQA GACAGGGACT CTCATGTATT GTATGTCTCC ATGOACTAGG 
17651 CTTTTX3GCIA GAATTTTTCA TAAACATTAC CTTTAAAGCA GTCTTGAAGT 
17701 ATAGGGCTGA CCACCSTTTT GTCAACAAAA AGACTAAGAT TCAGGAAOGG 
17751 TAAOAAATAT 6TTCAAAGTT CACCAACTGA CAGTTTCCCA AAGTGACAGA 
17801 ACCAGGAATC AAACC(3CATT AACTTATTOT GAOGCCTGGA ACCTACCAGA 
17851 ACCCATGA06 TGGGGAATkAC CCAGCAGCTT GTC3GTTGCAT GCACCAAGTT 
17901 ATATTATGTT GACAATXATA TTATTTCAAC CACGTTAAGC AGGCAAACTT 
17951 GGCTAXAAAA TGGGTTCACA AATTTTACCT GTAATGTAAC OGAAT6ACAT 
18001 AAGGCATGOC TAAACAAAAA QATATTCCTG TTGTAATAAA ' m ' lXrrrX ' CT 
18051 GTCATGGTGG AGGGGGAAGA CTCATATCAO TTGCAGATAT TGCTCAGAAG 
18101 TTTCAATTOT GTTATTTTGA AAAACTACAT AOCAGAACAC GCATGTCATA 
18151 TACACAAATC CAIGAGCCTG TATGACTCAT ATTTCTTAAA GATAAAGAAA 
18201 AATAATATAT TCAGATTTT6 ATTTATTTGA AGAAAATAAT TATCCCTTTC 
18251 TCACGAATAG ACTAATAATG CTTTGTTGGC AGGTGTACTC AAAGTTCTCT 
18301 ATGTCTTGAC TGAGTAACTA QTGACTTCC3Q TAAGGATTTT ATAACAXAAA 
18351 TTGGGTAATT CCTACAATAC TTAGGAGGGA AAAAGCATAT AAATGCTAGA 
18401 ACTTTCEAGA TTTCATGTTT TCTGTTTTCA AATTCTCCTT TACCATATTA 
18451 TTGTAGCAAC ATTATTATAC TCCTGTGAAC TCCTTT6GAT GGTAGCCATC 
18501 ACTATAZAAT ACdGGTAAA AATGTTAATT CCTCAGATTT AAGAAGTAAA 
18551 ATTAGTCATC TGTTTGCCAA TTTGACATAA AATTCrAGTT ATTTAGATCT 
18601 TTAXATTCCA OAGCCTCAAAT QAACAAAAAT ACAXAAATTG TCTCAGAATT 
18651 TCCTTTTAGC CAAAAGATTC AGGGAQATGG GCCTCTAGAG TTTTTCACAG 
18701 TTTTTTTTTT TTTTGTAAAA AAAAAAAAAA AAAAAAAAAG GAGAGATAAC 
18751 AGATCAAXAT ATATTAGTTT CAAGGTTTTT ' it:yi ' fi " X " i - x - i - i » TTTAAACAAA 
1B801 AACCTGTAAT TGCTTTTCCT ATTTTAACAG TATTTAAAAG TTTAGTTCCT 
18851 CAGGIAACAO AACTTQAACC TQTTTATATG ATCAAAGTTC AAGAAATTGG 
18901 GCATGTTTAA TTTGGAGAAG ACTOGGGGAC CACAATATTG TTGTCTrCAA 
18951 AXATTTOGGC TAGAGQAGQA AATTATITTA TGTATGTTCC AACTGGTAGA 
19001 CCTAAGCCTT ATGGAATGG6 AGATATAGOG AGACATATTT CAACTCAAAA 
19051 TGAraAACTC TTAAAAGCAG AGCTGACCAA AGAGAAACAA GCCTCTTTAO 
19101 AAAATTAAAC TTACXATCTT TTTAATTACT 6CACTQTCAT TAGAGGGCCA 
19151 ATTGTCATGG ACCCTGTAGA AGTGATTCAG GTATCSUU^A TACAATTGAT 
19201 TAGCCTAAGA AAACATGAAG GCTTCTTCTA ACTCTCAGAG CTTGTAATTT 
19251 TGATQATOAT TTTTTATATC TGTCATTCCT AGCTGCTGTA ACAATCCTTC 
19301 AAATTAATGO GGGAAATGCA CTOAAAACAT AATGAAAGCT AGAAGA6GGA 
19351 ACAIATQAAA TGACCTTGGG TCAGAATQAC ATQAGAGGAT CAGCACTTGA 
19401 CACTCTCAGC AACTGAGGQA TCATTCAGQG GAGGAAGATA CAGGTAAGAC 
19451 TGAAOGACAA TTCCAGGTGT ATTCTTTGAA AATGTACCTT T C r mtTlW 
19501 QTCACAGTCC AQAGQAAGAT GOTGTGAAAG TAQATGTCAT TATOOTOTTC 
19551 CAGTTCCCCT CTACTGAACA AAGGGCAGXA AOAGAGAAGA AAATCCAAAG 
19601 CAXCTTAAAT CAGAAGATAA GGAATTTAAG AGCXTTTGCCA ATAAATGCCT 
19651 CATCAGTTCA AOTTA ATOGT AAGGAGGTCC CCTTCTATOT GATATGAAGT 
19701 TOTCTATTAG OTCCATQTTT TGACGAATCT CAAA3TTATT 3OTCATTATT 
19751 TCCATTTCAA ATAAXAGCTA QAATTCAQAT GAAAAAATTC AAGrTAAAGA 
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X9801 TGTGACATTT CAA6GTGTAT TAGTCTCTAA CGTAAGOITG TCTGAAGTTA 
19851 GTCATCCAGT GGTmCCCG ACA£3TAATTG ATTGGCACTC ATCCCAAAAT 
19901 ATAGGCAAGC ATTTACAACT AACAGAGAGT TAATCCCACC CAG6CACTGC 
19951 CTCCATGACT AAOCAAGTGA AAATACTAGG GGTTTAGCAA TAATTGTTTT 
20001 TCTGGGTGGG ACCITCCTAA AACACAAATT CATGTCnTQC CATACTTTTA 
20051 TTGATAGTTT CTATATATGG TOATATACSU^ TTTTTGTTAG CTTTTTTTCC 
20101 TATGGGCMT TQGQAAAATQ GCAAOCCAAC TTTC5AAGTTG TTAOAGTCAT 
20151 TTTACCATTA ATC3CTTTAAA AATCACAGTC TAGGAAAACA TCACTGAAAC 
20201 TATGTGTACA TTGTTCCACT TTTCTCTTTT TTTTTGTTCA CCCTTAGCCC 
20251 ATTATACCAT TATCaCTTCC CTCAATTAAG GAGAACAAAC CTITATCAAG 
20301 GTCTATCTCT ATGGCCTTTA CCTTAAGTAA CTAATTTCTT TTTATATTCC 
20351 AOTGACGTAC GCAAATTCAC CTTTATAQAA GTGAAATTCA CACAAAAAGA 
20401 GTTGAGGAAT TCAGTAATTA AAAGGAGCTA AGAATCAAAT TTAAATCTCT 
20451 AATTTCTTAA AAGGCTCCAA TTAAAAAAGG TTTCTATAGT CAAACACATC 
20501 TTAAAAATTC T6GCTTTGAT ACTCGTTTCT TGGAAATTCT TCCTTATAGT 
20551 QTCATATTAA AAATTCTAAG 6C3U3CCAGCT AGAGAGAAAC TT6TTTACCC 
20601 TCGTCCGCTA AGCTGTTTGC ACAOCftTCTT CTTCCAACAG ACAAGTATAG 
20651 ATTTCTCCTA CAAATTTCAA TGGATACCAG ACCTAAGTX3T TACAGAAGAG 
20701 ATTCAGGGCA AGCGATTTTT ATCAGACATG AAACAGGACA CTCTGCCCTT 
20751 GTAAGGGTCT AGCTGACACT TCAAQAGGAA ACCAQATAAG GAAGTAAAAA 
20801 ATGT6AGGTA ATQGAAIGGG CAGATGTTT6 CTQATGTGAG AACQAGTCAG 
20851 CTACTTAOGG AAXAAAGCTQ AGGACCTCTC CCAGCCAGAA GG6AGGAACC 
20901 TGACAAGTGC TTAATCCRTC TTCTTTGTTA GATGGGGAAG C3UUm3AATA 
20951 GAA6TTGTQA AACAATGGGC ATTCTGATAA TTTACATOAT GCTTTCTGTO 
21001 TAATTTCCAA TAAATAOTTA ATTTGTCAGG AATGTAAAAG CCTGAACTAT 
21051 CTQAAACCAG AGTAAAGCAT AAATTGTTCA TTGGCTGCCT GGTCTTTTTG 
21101 TTTTTTGTAG GCTCAGCTTC TAAACTTCA8 CTTATTTTAA TAATTGTACT 
21151 AAATTAAATG GTAGGATATG CTAATGGAGA ACCTGA XT1X3 AGAGTCACCT 
21201 GAGGCTGGGC ATGGTGGCTC AAGCXTTATAA TTCCAGCACT TTGGGAGGCC 
21251 GAGGCGGGTG GATCACCTGA 6GTCA6GAGT TCAAGACXZAG CCTGGCCAAT 
213Q1 ATGGTGAAAC CCXSGTCTCTT CTAAAAAXAC AAAATATTAG TCAGGCCTGG 
21351 TOACGGGCAC CTGTAATCCC AQCTACTTGG GAGACTGA6G GGGAAGAATC 
21401 ACTTGAACCC GGGAGGGG6A GGTTGCAGTG AGCCAASATC GCGCCACT6C 
21451 ACTCAAGCCT GG6CTTGACA GAGCAAGACT CCATCTCCAA AAAAATAAAA 
21501 AATAAAAGAG TTACCTGACC AATTCTAACT CCACTAAGTC ACCACAGGAC 
21551 CACCCAAATA ATTGGCTCAr GCXrrTTGTCT TCATTTTCTC ATCTGTAAAA 
21601 TTCCAATGGT AATGTTTGTT CTTCCTGAAA TCACAQAGAG ATTOTAAOGA 
21651 TATACAAGGA AAXAGAAAAC ACAATGT6AA ATAAAGAGGC TGTTACTAAT 
21701 GAGAAAACTA TXftTGTTGTG CAXATGCTTT GGAAACCTGA AATCATTAAT 
21751 TTGAGTGATT 6ACTAGTAGC AGAAAGATAG ATCCTTOAAA GTTTCAGAAT 
21801 GTTCAATGTA GAAAGAACAG TGTTTGTTAG TGATATOGGA GCCTAGGGGG 
21851 TOTTGCTTTT CTGGCCAGAA ACCTCTGTQG CCAGTGGTTO GTGCCTTTGC 
21901 CGAAGTTTTG CTCTGOCCCA CTOGOCTTGT TCTGCCGACT TGACCTGGCA 
21951 GACTGT6CCC ACCTTCOGCT ACCAGCCTGG ATCCCATGCC CACCAAGGCX: 
22001 AACX:CAGGCA TGGAGCTGTG AGGGTTGTCT GAGC6ASGAC AGGGTCTGGC 
22051 CACTGCCCAC AGOCAGGCAC ACTOGCTGCA GCATGAOGGG CAGCTCCAG6 
22101 CAC3X3GCACA GOTGTOCTGT CTCTCTGTGA GGCTQTGGCT GGACAAAGCT 
22151 CACTGCAAGC AGCTTCCCTG GCAGGCACCT GGGAAT6TGG TGGGACCCAG 
22201 6AAGCTTGQA GATGCCAOGA ACTGCAGGGT CCCAAAGAGG GAGTCACAAC 
22251 CXrPGGCTTOG GGAGCTCXXA GGTCTOGGAT CCCTAAAGGG CTGCAGCTTT 
22301 TCTCTCTTTT TACCCACAAT GTGGCCAGCA AGGGGTATOT TTCATTCCTO 
22351 TTTGTOTTAC AGCTCTTTCA OTCTTGCTAT TTG6CAGGTC CTOAOTTCTT 
22401 GTCCTGAGAC CAAGAAGAAT GAGGTATOCA GACAAGT6GA GGGTGAGCAA 
22451 GA06AAGAAA GGTTTACT6A GCAAGAGAAC AGCTCACAGG AGACCCACAG 
22501 TGOGCAGCTC CTCTTCAIAG CCAGGGTGTC CC3UICAAGTG TCCAGCTCCT 
22551 AGCAAAGAGG AGGCCCTGQA OGTAGAAGCT CCTCTCTGCA GGCAGGTTGT 
22601 CCTGTTGAGT GTTCAGCTTT CAGCACAGAG TAGGCAGTAG GCXXnTAOAGT 
22651 GGTCTATCTC CTCTCTGCA6 GCAGGTAOTC CCSOtSGTCTC CCAGTCAOCT 
22701 CTCCATCTOC AAGGGTCCAA TQCTGCCTCC AGCACXTTCTC TGCCCACCCC 
22751 TCOOTOCCTG AOCAAGCTGC TCCCCCACCA GTGGQCAACT CAGCCCAGCC 
22801 CXATTGTGGT AGCTCCCA60 OTGGCAGGCT CTGGOGGGCT CCCAGGGAT6 
22851 GGCTCCAAGG ACTGTCCACC TTCTCCCCAC GCCCTCCCTG CAGTraGCCAT 
22901 6GTCAAGAAT GOCAATGTGG GGCCAGGTTC CGGAGCAGGA GAGGCTCCAG 
22951 OCCTGGGAGC AGOTCCTOCC TGQTCACX5TG AGGTTQGGGG TOGCACAGTC 
23001 GGCTGCCTCA GQGATGTGGQ ACACAGGOGA CCCACCACCA TCACT6CTAC 
23051 TCCCXK3VTCC OCTCCTGCTA CCACTOCTCC AGACAGCCTG TAGCTOCCAT 
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23101 CACTAGCACT TAAGAAAGGC ACATTCAGTG GACAGCTCAG GAAAATCTTT 
23151 ACXrrCAATTT TTTATAQGCA AAAACATTGT TTCCTGGGCA AACAAAATTT 
23201 ATGGACTACC AATAAATAGA AAACTGTAOA GATTCTAGAT TAAGTCTAGA 
23251 AATAATCCTG TAGCCCAAGA TTTATTTATA ATTTGTCAAG AATCTGTATT 
23301 TTGTTTTGAC AAAAAAAAAA CTGTGTGGTG TGGGTCCTTC AGGAGACACA 
23351 GTGTGACAAA GCAAAGCTAA AATCAACTTC TTTGCATTGC AAACACCAAG 
23401 OCTGTAGTCA AGCAGCTCAC TGCCTATGTG TCAGATGACT TTGCTTCATT 
23451 TTTCATCATG ATACTTGXAG TCTATAGAGC CCTGAATATT AACTAGCTTT 
23501 CTCCCAACrC AGAACCGTGT TAGGAGGTGG TTGCTTTCAA AACTAAAGTG 
23551 TTAATGTTTA TTTCCATTTC TATACCAGGA AAGTAAAAAT CTTTGGTCAA 
23601 AATTAGAAAT CITTAACAAC TAGTTACTTG TOTATTGACA 6TTTGTTTCC 
23651 AGGT6TAATC ATTCTCCCTT AAAATCC6GT TATATTCACG ACCATTATAC 
23701 TTATCCTGGT ATCATTCCTG GAAATGGCTA ACTTGCATCC TGCTCAGACT 
23751 AAGTTGACAA AGTTTCAATT GAAGAATTCT AACTTTAT6C TATTTTCCAC 
23801 TTTATTGCAT TACT^AAGGAC AAAATATATA 6TTTTCTTAA AAATGAAATA 
23851 AATTTACTGC CTTAAACTAC ATTTGACGGT AAACTGAGTT CCTTCCATAO 
23901 AATAACCACX AACA6CAATC GATGGTCCTG AGCAATTGAC TCTTCACCAT 
23951 ACAATGATTT GGGATGCCTT TAAGGGTATA TTTGAATTGA ATATTTTCAA 
24001 AAGCTCCCAC TTTGTAGAGT TTATCATCAC TAGTTTCCCC AGTGGAATTT 
24051 GTAGAAAGTT AGTAGAATGA AACAATCTTA TTTTGTATAA TGAGGAATAG 
24101 AATACIGAGA ATGTGTCTGA GAAACATGGC ACTGGTA6GA AAAAGTAAAC 
24151 AOTTTATTCT CATCTGCTCA ATAAGCTAAG TCATTTTAAC TTGAAAATCA 
24201 TCAAAA3TTT GATGAAACCT TCCACCAACT TTATTTTTCC CCAGCTTTAG 
24251 TATkOATAXAA TTGACAAATA AAAATTGTAT ACTGTATACA ACATGAT6CT 
24301 TTGATACATG TATACAAGTT TAAATATTTG TGTTTCCTTA GTCAAACTCC 
24351 TC3kCTTTTTT GGAAGTT6AC AGAATTTAAT CTTGGATTGT 6TCCAATAAC 
24401 TAGCTTTTAC CACTATTCAG TATATTTTGG ATAAGAAACA CATAACAGTT 
24451 TATTCTTTAA AAAA6CAATT TTACTATTTA GGAACTGTGT TTAAAAAGCA 
24501 TTTTAAATAT CATTTATGCA AGAGTTTTCA AGGTTTTTTC ATTCTAAACC 
24551 CTTTAACCAA AAAAAAAAAA AAAAAGATTT ATGTGAAATT C3GAAGTAAAT 
24601 AGAAGAOATC AAAGCAGATC TGTTCTGGCT GAGGCTGAGT TTGAGACCTG 
24651 TAA6ACAGTC TACTTGCCAT ATGGCTTGGC TGTGTCCCCA CCCAAATCTC 
24701 ATCTCGAATT GZAGCCCCCA TAATTCCCAC ATGTTGTGAG AGGGACCTGG 
24751 TGGGAGATAA ATTAAATCAT GGGTGCAGTT TCCCCCATAC TGTTCTATGG 
24801 TAGTGATkTGA GATCIGATGG TTTTATAAGA GGCTTCCCCT TTCACTTGGC 
24851 TCACATTCTC TGACTTGCTT GCCACCATGT T^AGACATGCC TTTTGCCTTC 
24901 CTCCATGATT GTGAGGCCTC CCCAGCCACA TGGAACTCTG AGTCCATTAA 
24951 AOCTCTTTTT CTTTATAAAT TACCCAGTCT CAGATATGTC TTTATCAGCA 
25001 GTGTGAAAAC AAACTAATAT AACCTGTTTC CTCTGTCCCA TTTATCCATC 
25051 TTCTGAAGTG QAATGCAAAG AA6CTTTACC CCGAACTGCT GGAAAACCAT 
25101 A3TTCTCTAT TAATACAAAC TATTT6TGGG CTTTAGTCAT CCACTATTTG 
25151 TQCCTTACTC ACCCATTGCT TGTGATAGTA TCCACCTAAT TAGAGGCTGC 
25201 CXATAAOTCT CTACAAAAAC TGTACACAGA TGTTGTTATA TCAGATAGCC 
25251 ATTCTCCTAA TTAATCTATA TGTTCAACTG TCTAGAATCC ATATATGGTC 
25301 AGTATCCTCT GATTATTCCT GGTCATTGAG ACCAACCAGG AAAATATCAA 
25351 KSTKTCACTA TTTOTTrCAT CTTCTTTTTC A6CAATGAGC TCATCAACAG 
25401 GGaAOTTAAC TGTGCAAGCA AGTAAGTCAA GTTAGCTTAT AXAAACAAGT 
25451 TCAATTTTCA CATCAGftAAO GACATTTTCA AATATTTGCT CATACTTGCC 
25501 CATCTGTOCT CCAGATTTTC TTTGAGAGAT AATAACTATT TGTACGATAG 
25551 ATTTAAATAC ATTTTTTTTC TAACTCATGG ACTGATCTTT TAGTCATGTT 
25601 CAAGAAAAAA ATTGCCATGG TAACCTTCTG GGGCAATTT6 AAGAAAGCAT 
25651 TTATTTTTQA TTGGOAATAT TGGACTTGTT TTTCTAATTT TTAAAAATGC 
25701 GATAAAATGT ACTTTCTGCT ACAAAATAAA ATAATAAGAA AGTAATCAAT 
25751 AGGAAGGACA TAAAACCCAT TGTCTGTGAC TGACAATTTG TCTGTGAAAT 
25801 ATGCIAAGGT CA6GAGTTCG AGACCAGCCT GACCAACAT6 GAGAAGAAAA 
25851 CCCATCTCTA TTAAAAATAC AAAAATTA6C CAGGTGCGGT GGCAGGTGCC 
25901 TGTAOTCCCA QCTACTTGG6 AGGCTGAG6C AGGAGAATCA CTT6AACCTO 
25951 GGAGGCAGAG GTTGCAGTOA GCCAAGATTG CACCACTGCA CTCCAGCCTC 
26001 AGGQACAGAG TGAGACTCCA TCTCAAAAAA GAAGAAAAAA ATATTGCTTAA 
26051 TAQATTCATC TTAATCSGCTA ACAGTGGCTT CATTAAATCA CTTCAAATCA 
26101 CTGTGG CCIA ARTTTTGAAA GATTTTACAA AAAACAGTGA TGAATTTQAG 
26151 CAATGATOrP CATGCATTTG CCTCTGTGAC TTGCAAACAC CXTTAAGTATT 
26201 TTTATCCATG TGTTTATTCA TTCAACAATA TCTTTTAACA TCTACXaVAGT 
26251 OCCAOAAATT AOACCAGGAO TTGGTGGTAC CATTGTQAAT AAAACATGAT 
26301 CCCTQCTCTA AAATTAGAAT TCCAAAGTAO AGAAAGATAT AAATAAATCA 
2 63 51 GGAAGTATGA AAATAATGTQ ATTAATGCTA TQACAGAGGA AGTGCATA0T 
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26401 GCTATGAGAG TTGATCAGAG AC3TCAGCTAA CCTGTTCTCA CACAGTAAGA. 
26451 AAGTGAACCC TGAAATGTGA GAGAGAAGAO 6CCATGAATC CAGTGACAGG 
26501 TGGG6TAAGT GTCCTGGGCA GQAGGAGTAG TATACGAAAA TGTCTTCAG6 
26551 CAAGTAAQAA TGGGGTCATT TCCTGTAATT ACAAGATGTT TCTTATAACT 
26601 TAATGATCTC ATCTTTTTTC AGGTTC3TC3GT AAACGAGTTG TTCCATTJlAA 
26651 C3GTGAACAGA ATAGCATCTG GAOTCATTGC ACCCAAGGCG GCCTGGCCTT 
26701 GGCAAGCTTC CCTTCAGTAT GATAACATCC ATCAGTGTGG GGCCACCTTG 
26751 ATTAGTAAGA CATGGCTTGT CACTGCA6CA CACTOCTTCC AGAAGTAAOT 
26801 TATTGACCTT AAGTTAGAAC CCACTTCTGC TAAAAAGCCC TGAGTTTTGT 
26851 CATATTCTT6 GTAACAATTA AT6TCTCAAA TATTACTGAA GTAAAATAAG 
26901 AAAAAGTTAT TTCAGGTTCT TTTCTAAAAT AATGTTACAC TTGCATACTT 
26951 AATCAGAAAT TTGATGGGAA TAAGTAACA6 TCATTATCCT AfflATCCATC 
27001 AATC31TTTCC TCAAAGTTTT TAATAAGGAA ACTGTGTAAA GAAATCAGAA 
27051 CTATTTTGTG ACATCCTAAC ACAAAATATT CACTAATAAC ATGTACCATr 
27101 AATCTTTTOT CAAACAATGC TCTCCACTTA AAACTAGTGT CTGTTTCTGC 
27151 CAAACACTTG GGCGAOTCTC ATACTGATCT TAAATAATCA AACTAATTCX: 
27201 AAAGTAAAAT GGAAATTTTC AATAAATGCC GGAAGTTGGT AACCGTGATG 
27251 ATG6AGAACT GCAGATCAAA TTTAGAGCAT TQACATATGA AGATCTGTGG 
27301 AATCAGAACA GTTXACAACC AAAATGAGAG ATTGCTA6CA TGATAAAGAC 
27351 AGGCACTTCA AAAGAGATTC CTCGGA6TAT CAAAG6ATTC ATAGAGGGCC 
27401 TTGGGCCACT CAATQTtSACC TTCCCATAAT AGAOCATCTC TTCACAATAG 
27451 TQACACAAAA GACAAAGCTG AAGTGAAGAA TAGCAAATTG TGCTATCCTA 
27501 TAATTQTTTC TGAAT6CATA CATTTTATTA AATAXATGAT TAAATGACTT 
27551 TTTRTAACTT TTAATCTTAC TTTTCAAQAT AATAACCAGT CATTTTTATC 
27601 ACTATTACAT TTAGAATTTT AGATTTOTTT CTAA6TAGAT TAACTGTATC 
27651 GCCTTTCTTC TTCATTGCCA ATTATTACA6 TAATAACAAA GACTTCTTGA 
27701 GTATCTCTAT ATAATAGGTG GCAGCAGGAT TTAGTOGGAA AAATATGTCC 
27751 CAGGCAGTTG GAGAGCTGG6 CAAATTATT6 AACCTTT^TTG TATTAGGTAA 
27801 TAGAXAGGCT AGATCTTTTC ACATTCTTTT TGACCXATAA AATTCTAACT 
27651 TTTGTTACIA TAATAAATTT GATTTGCCTA GGAGCATAAA TCTTTATAGA 
27901 GACTCTTAAT ATTCCAAAGA ATATACA13VT TAAGAATCTA OGCTTG6CAT 
27951 GGT06CTCAT 6CCTGTAATC CCAGCATTTT GC3QAGGCCGA GGCAAGAOGA 
28001 CXACTTGAGC TCAOGA6TTC AAGACCAGCT TGG6CAA6AT AGTQAAACXX: 
28051 CATTG66CAT GGT6GTGCAT ACCTATCATC CXS^GCTACTT GGGAGGCIAA 
28101 06CAGGAGGA TCCCTTAAGC CCAGGAGTTT GAGGCTCCT6 CAAGCTATQA 
28151 TTGCACCACT GCACTCCAGC CTGAGTGACA ATGCAAGACC CCATCTTAAA 
28201 AAAATAGTAA TATATTTTTA AAAATAATCT ACATAAATTC TTAATOTTTG 
28251 AAAGAIX3TGA OAGCTCAGTA AfiCTGATATA TTAQAAAGCC AGAAATCCCT 
28301 TA3X3CTGGTO TCTOGTTTTT CAAAGTAAXG GGAAACTTAC TTTGCCAAAG 
28351 TTAGCCATTT TTGTGGTAGA TAGXTCTATT TTTGCAAATA TCTTCATAGC 
28401 ATT6AACACC AAATCTATAC TCZATTAACT TCTACCATCA ATATTT6TTT 
28451 TTCTTTTAAT CTGGAACAAC AGGAACCAAT r iTA r ri CT r CMTCATATA 
28501 ACAGCTATTC TTrAGTTTCT CTTTTTCAGA CCAAACATAA AATOAGGQAG 
28551 AATATCCAAA CCATAA6TC3A AAATAAAXAT CATTACTGT6 AGCTTTAGTT 
28601 TGCTAAGOAT AATGACCTCC AGCCCTATCC ATGrCCCTGC AAAGG6CAT6 
28651 ATTTTGrrCT TTTTATGGCT GCATAGCATT CCX31TGGTGT ATGTATACCA 
28701 CMTTTCTTT ATCX3W5TCEA TCACTAAIGG GCATTTA6GT TGATTCTATG 
28751 TCTTTGCXAT ACOQAAOAGT GCTAGAGGGA GAGGATCAG6 AAAAATAACT 
28801 AATQQGTACT AGGCTTAATA CCTGGGTOAT GAAATAATAT GTACAACAAA 
28851 ACCXXAT3AC ACAAGTTTAC CT6TGTAACA AACCTGCACA TGTAACCCTG 
28901 AACTTTGAAA AAAGXATATA TATGCACACA CATATATAIG CATACAXATA 
28951 TATGICTGTA TATAXATGCA TATATGICT6 TGTGTATATA TAAAAAAAAA 
29001 TATATATATA TATATATATA TATAATTACC TCATTTTTCC AOAACCAACT 
29051 TCCAQAT6CC CTACCACATT GGTrCTTATT CTCTGAACAT TCGAGACTTT 
29101 GTCAGTGTCr TCCTTAAAAT ATGCTTOCAA TAACTAAATA CACCAAGACA 
29151 OATaiGTOAC TAGTGTCACA CATAACAAAA TAAAGCA6GA AGTCTTCTGA 
29201 AAAATACAAA TAATOTAAAT TGGTGGaAGA CAGTGTTTTA TAAAGGGAAG 
29251 AGCAOAGAGA GGCAGGGAGA TATGTQATGT 6AATCAAATA GTTTAACCTA 
29301 TCGAQGCrrr ATTTTOCTTA AGTATAAAAC ACAGTCTTTA CTAGATGATC 
29351 TTTCATTGCT ACTAAATGAT TTTTCCOATT CCTGTATGTA CCATAATCXA 
29401 CCCATTQCCC AAGCCX3\CAA QCrAGAAGTC AACOGCATTT ACCACATTTG 
29451 ATCATCTCTC AAAGGACTAT GCAGTCATCT AATAGACTTT ACCACATCXA. 
29501 TTCTTGACCr TCAAOAATCX ACrrCCCCAGA AAGAACAAAC ATU'lTiTiTA 
29551 AAAATGTAAA TGAGACTACA TTATTCTCTG GCTTAATTAT CCAOTAGATT 
29601 CCCAXATGAC TTCAATAAAA TTTAAOCACT TTATCATGAC CIATAAAACA 
29651 CTCTAAAATC TAGTCCCTGC TTACCTCTCC AAGCTCACCC CCAACCATTC 
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29701 TTTCCCTTGT GTTCTGACTG OUSCCCATCC AACCCAAGAC CTTGGGA.TTT 
29751 TTGCCTGGAA ACTTGTTTCC CTCATCTCCT CACACTGACC CTCTTTTACT 
29801 ATOTCTTAGC CCAAATGCGT TAXCAAAATA ATCATAATGA CCTGTTAGTA 
29851 CTCTATTCOG TTACXOTATT CTATTTTCTT CATAGCCTTT ATCAATC3TTT 
29901 AAQATTATTT ATCTATTTGT TTGCTTGCTT TGATCCTTTT CCTTCTCTGG 
29951 AATCTTATAC TCCTGTGAGC AGGCACCTTA GGTCCTGTTC ATCACTTTAT 
30001 CCCX:A6CA6T TCAGATAAGG CTCAGCACAC AGATGCTCAG TAAATATTTG 
30051 TGGAAGGGAT AAATGAATGA TATTTTATGT GTATTACAGT TCTAAAATTC 
30101 AATAGTTTTG TATTAAATAT CAGTTCTAAT ATGGCATTTA TATGATTTTA 
30151 TCTTTCAAAA CATTAGCAAT AGATTATATT TAAATGATAA AASAAAACTA 
30201 TAACTGCAGC CAAGTATTCT CAGGATTGTA TTTCTCTTAT ATTAOCCTAA 
30251 AIGCAATTAA TCTAGCTCAT A3ACTTTGOG CAGCTTATAT ATATTCTGTT 
30301 AATTTCTAAC CTTTTCCAOG TATAAAAATC CACATCAAIG GACTGTTAGT 
30351 TTTGC3AACAA AAATCAACCC TCCCTTAATG AAAAGAAATG TCAGAAGATT 
30401 TATTATCCAT GAGAAGTACC 6CTCT6CAGC AAGAGAGTAC GACATTGCTG 
30451 TTGTGCAGGT CTCTTCCAGA GTCACCTTTT CGGATGACAT ACGCCAGATT 
30501 TGTTTGCCAQ AAGCCTCTGC ATCCTTCCAA CCAAATTTGA CTGTCCACAT 
30551 CACAGGATTT GGAGCACTTT ACTATGGTG6 TGGGTATCTC AGGATABCTA 
30601 ACAGAGCGCT AAGCCCTGTC TAAGGCAATG TGATTTCATC TCCATCAATA 
30651 TTATCCTGAC AGCCATTTCC ACACAGTCTG GTTGGATTAG TTAGGGTTCT 
30701 TACTTTGTGT GACAGAAATT CSkATTCACAT TAACCASTGC AGAATAAAAA 
30751 ACAAAGAAAC AAAAACTTCC ACAAAI^tTGG CTCAT6TAAX TTGGAAGTCA 
30801 AAAAAGTGTA OTAAOTTTCA CTTCAGACAC AGGGGTTTAT ATQATGTCAT 
30851 CTGGCTCTGT GTCTCTOAAT TTOAATTTTT TGCCCCTTCT TTTCTCTATG 
30901 TTGGCTTCAT TCAGAGOGAT GCTAGCTTCA CCTAGTGTCA GAGGTOGCTA 
30951 ACAACACCTC AACACATCAT CCTCAACAAA GAAAAAATAC ATAGAAAGGA 
31001 ATATTTATTT CTTTTCTTTQ CCAGAATTCA CATTAATTTC TATTGTTCCA 
31051 GCTGTGTCTA GGAGGACTCA GATTaAGTGG CTAACTCAAA TATTCTTTAT 
31101 GCCZATGZAG CAAAATTTGC TTCAGTACTG AAGAAGCTAA TTTAAGTOTG 
31151 ATGGTGAATA AGAATAGTGT AGAGAXAAAT T6TCAAACTA TTTGTCCCCT 
31201 CTAAAAGTAT TCAACTTGAT ATACTAACTT AGTCTTGTAA GAAATAATGA 
31251 TGATTTAGTT ACTGAATGTT CTAOGCAATC TTAGTGAGAC ACXSCTCrOGA 
31301 TTCTAACATG TGGTCXAGGT ACAXATGXAT AACAAAGCTA GAAAGTTTCT 
31351 TTAACACTGG 6CTTGAGAAA ATGCAAAAGG 6CTTTCTGAG AATGACTAAA 
31401 TCTATTTGCA GGATTCTATA CAATTtATTT ACATACAAGA AATTATAAAG 
31451 AATAAGCTTT TGATTCTCAG TCTACCSiTTA AGGAACTAGG AATAACCTTT 
31501 C3^CrCACATA GGCAGGAATC GGTTTTAGGG TCTCTAGATT TTTTCCAGAT 
31551 GTCCCATCTG Igi ' mumT ATCTTATACA GAGTGAGACA TGCATTGCTT 
31601 TCTTTAAGGT TGTATTACCA ATCACAGAAA ATATTACCTA TGGTTTATTA 
31651 ATTCTAGTAG ATCCAGTGCT GCTGTAAGCC TGACACCTCC CTAGGTCTGC 
31701 ACTCTCrPGG ATGGATTTTC TCTGAAGATA GGGCTTGGAT TCTCTGCTTC 
31751 ATAGTGGTGG 6AAAGACATC ACAAATCCCC TTTGGCTTGG TGGGAAAAAT 

31801 CACr rrCAGG AGrrrGAGAC tggcacagaa acaxacctgt cataatgogc 
31851 tgtqagtggc aacagaatct gacacttaxa gagcactcca ccctacttga 
31901 acacggcctc tcttggtgag tqacccacao gtqcttttaa tciattaaat 

31951 AGATTAAATT AACCTATCAT TCrtAATCTG TTAAGTACAT TAATAGATTA 
32001 AAAGCAGCCA TTOGTTACTC ACCAAGAGAO GCIATATTCA AGTCTGTAAA 
32051 GCAAACCTTA AGAAGTTTTT TAAAATTGAA ATT6TACAAA GTATATTCTC 
32101 TGATCATAAT GGAATCTAAC TAGACATCAG XAACAGAAAG ATAACATAAA 
32151 AATCXXXAAA TOCTTACCAA TTAAAAAACA TATGTAAATA AAGAGAATAT 
32201 CTOGAAGAAA TTTGTAAAAA CAAATAOAAC TAAATGAAAA CAAAAATATA 
32251 TAAATATATG CCAGATGCT6 CTAAAAXAGT GTAGAAAGGG AAATTTATAG 
32301 AAAATGCATA TTATAAGGAA AGATATCAAA TCAATAATTA AGTTCTCACT 
32351 TCAAOAAACT AGAAAAATAA AAAAXAAACC TAAAACAAAC ATAAGGAAGQ 
32401 AAAXAATAAG AATAAGAATA GAAATGAATA AAATTAAAAA TAAACTATAG 
32451 AAAATTGATA AATAAAAAfiC TQATTATrPG ATAAAATCAA TATTTTGCTA 
32501 QAAATGTCAT TAAGCATTTT TACAGAAGAT OAGAtATAGC TCAG6GATGT 
32551 CCAQAATTTA TGOOCTATGC TTTTCATQAC TTGGAATACA TTTTACCAAC 
32601 C3VGTTTAGTT TGCTQAAQAA GrTGTGQATT TGGACTQTCA CCTACTTACA 
32651 ATACTTAGAT TGTCAGTTTC ACCTTACTCT TCTCACCATT ATTTTATTTT 
32701 TATTTTTATT TTXATTTTTA TTTTQAAACA GAGTCTCGCT CTGTCTCCCA 
32751 GGCTGGAGTG CAGTGGCOTG ATCTCGGCTC ACTGCAAACT COOCCTCCOG 
32801 GGTTCACOCC ATTCTCCTGC CTCAOCCTCC C3GAGTAGCTQ OGACTGCAGG 
32851 CGCCCRCCAC CATOCCOGGC TAATTGTTTT OTAGTTTTAG TAAAGAAOQG 
32901 OTTTCACCGT GTTAGCXAGO ATGGTTTTGA TCTCCTGACC TCXSTOATCCA 
32951 CCTGCCTCGG CCTCCCAAAG TGCTGGGATT ACAGGOGTOA GCCACOSCGC 
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33001 6CCAGGCCAT GAATGTTTTT AATT6A.TGAT ATAGTAGGCA ATATAAA.T6T 
33051 GTGTGTGTGT GTGTGTGTCT GTGTATAATA TATATAAACC AATTGTATTC 
33101 AAATAACAOA ATAATTTOAA AAATCTCTTA GCATATTTCT GASTTACACA 
33151 CTTAAATCTT COQAGCACTT TTAAATATCT GTTTACAAAC ATTTCTTCAS 
33201 AAATAAATCT TGGAAATCOT CTTCTAAAGA AACTGGTCTA TTAGGGTTTT 
33251 TTCAAATGTA CTTAGTTTTT TTTTTAATTG ATGTArTAAAA TTGCATGTAC 
33301 TTACCATOTG CAACATAATQ TGTTQAASTA TASTATATGT ACACTCTGAG 
33351 TCTTAAATCT AGTTAACTAA GAAQCC?rCTT ATTTTACATA ATTATCATTT 
33401 TTGTGGCAAG AACACTTAAT ATCTACTCTT GTAGCXTTTTC TCAAGAATAC 
33451 GATATATCAA GAGIAOGCAA CCAG2UkGCTG GGGGTCTTTA CAGGGGAAGG 
33501 AUTTAGGGAG ATGCTGGTCA ACAAATTCAT A3TTGCAGTT AGGAAGAAAA 
33551 AGTTCAAGAG ATCTCTCATC GATdTGGTG ACTATAGCTG ATGATATATC 
33601 GTATTCTTGT ATTAGTTTTT TATAA&T6T6 TAACAAATAA TCACAAACAS 
33651 TTAAAACAGC ACTCATTTAT TTTIATCTCA CTGTTTTCAT GAGTCAGAOG 
33701 TTCAGACACA 6CTTAGTTGA GTCCTCTTCT CAGGGTCTCA CCAAACTGTA 
33751 ATCAAQC3TGT CAQCTGGQaT TGTQGCCACA TCTGTaOCTC CTTTGAAGGT 
33801 CTCCTCMGG TTTGCTGGCA GAATTCCTTT ACTCGCAGCT GTAGAATGCA 
33851 TGCCAQCTTG CTGCTTTAAC TCTTTAGGAA AGTGTCTCAA CTCCAGCAAG 
33901 GCTCGCCCTT TTTGAAATG6 CTCASCT6AT TAGGTCAGGC CCACCTTTGA 
33951 TAATCTCCTT TTOATGAATT CAAAGTCAAA CTCATTAGAS GTCTTAATCX3 
34001 CATCTGIAAA ATTCCCTCAT CTTGGCCAXA TAACATAACC TAATCATGAG 
34051 AATGGCATCC CTCATATTCA GAGATCCTGC CCATATTT6G GAGGAG6GGA 
34101 ATCACACA0G AATCTTG6GG ACTATCCTA6 AATTCTGCCA ACCATGGGGT 
34151 CRTGGTTTCC C!AATCAATAT ATGGTTTGQT ATAAAQAATC CCTGAATGCT 
34201 TOTOCIATXC TTASTTTTCT ACX3TAGCCTG CCATAATAAT GGTTTCXAAA 
34251 ACTCAGAACC XAGCTTACAG TCTGCAGCXA CCAACTTGTA ATACATTGGA 
34301 AGTGAAATCA TTGCOGTTTA ATGCATTTAT ATATATATGA aXSTATAATAT 
34351 ATGTATATTT CACAXAXATC TTAXATATGT GAAAGCTCAT CATAAACTTT 
34401 AAATAAIAAA ATAAATGXAC ATAGTATTAT AGGCATTTTA TCAASGCAA7 
34451 GGAGAAAACC ATCXAGGCAT 6CAGA0TTTC TGGGAACAAT CTGGAACCCft 
34501 CAAATATkAAO CTTTACAAAA GATAAAAGGC CTTCCTGAAA TATATAAGCT 
34551 GATTATTTTT AAGGTTAGAT TTTACCaWSOA AJIAAQAATCC AAATGGCTTT 
34601 CTTGCTTTOA QAAGTTTTTA TAAAAATGTG ATTGGACAAT AATTATCX3TT 
34651 AGATGTGCCA GATTTAACCA GAAATTCTTT TTTCTAGAAA CTGCTTATAT 
34701 TAACTTCATT CTCTATTGAC AATTTTACCA TQAAAAAAAT ATTAGQAAAO 
34751 TCTTCTCACT TCACTCTAGC CRAAGATGCT QATTGrAAAT ACTAGAATAA 
34801 CTCTATTTTT CCTTAAGG6G AATCCCAAAA TGATCTCCXSA GAAGCCAOAS 
34851 TQAAAATCAT AAGTQACGAT GTCTGCTU^ AACCACAG6T 6TATGGCAAT 
34S01 QATATAAAAC CXGGAATGTT CTGTGCCGGA TATATGGAA6 GAATTTATGA 
34951 TGCCTGCAGG GTAASTTGGA GGGATTTTTT TATATTACTA ACTCAAAAAT 
35001 TTGTATCTGG CTTAOAAXAT ATTATATGTT CTTTACaWAA GGACIU^AACA 
35051 TAGAXATCAT GTCASCTCAA AAAAGTTACA AAIGCAAATT TCACAGCACA 
35101 AAATACTTTT AAATGmTA TTAAGATAAA TGMUGTAAOA GTTTCTCTQA 
35151 TGCZATCIkAA CAAACAAAAT TAQAATTTCT TAACCAGAAA TCCAAASATT 
35201 AAIAAAGCA6 TTTATTTTCT CAAGOGOCTC ACATTCAAGA AAGAAAATAA 
35251 TCAIAAACAS AGAAGTAIAA AGIOAICTTA TOAATAATAT AATGAAAAGC 
35301 AAATATTTTT CTTGAAGGAA ACATTTTT6G AACAAGTATC AGAGAGATGA 
3S351 GAOGTAAAXA AGGCCTGAA0 AATAAATAAC ATCCAATTTC AOAAXAAGAA 
35401 AATAAXGTTA TAGAAAAGAC AAAAAGCATA GCXSVAAATTA TGAAOGTGT6 
35451 AAATTACAAT TCATATCTGA GGQAACTCCA AGTAATTGGT TGGGTCTCAG 
35501 GAIOAGGAGG ATGAGAAGAG AAACAAGTAG ATAACCATGA GAAGGTGGAT 
35551 TAGGCCATOT TOTOATTOCA TGGOCCCTCC CCAOTGCCCT CATCTGCXTTT 
35601 erAACATOGA TOrPTTCCM CGAAGCTTAGG TTTCTTCCTG GAGACACTTG 
35651 CTTTTTAACA TQAGATACTT OAGAACTCIA AG6AGGCCAC TCTAICTGGA 
35701 AATGAT6GAA TGGTATTGAT ATCAG6TQGC AGAAAGTCCT GTCCAGAGTC 
35751 CCACAAACTQ TACCACAT6T 6CGA0CTCTA TCAGAAAAGG AGCAGGGACC 
35801 TATGOXJACAT AOAGGCTOGG CAAAAOCAOG ATCTOGTCCA CAGCXaGCCT 
35851 OGGTTGCTAA TAATOTGGAG GGAGGCAGGC AGAATTTAGG GATTCCAACA 
35901 AAA6GTCCAT ACCAC6GGGA ACAGGTY^GAA GGTGCAGGA6 TCTTGGAGCA 
35951 GAGAGOACCG GGGAATTCAG GTGAACCSVIG ACATTACTaA AAAGCCTTAG 
36001 GAGGGATTGG TGGTGATAGA OATGCTTCAC TQGATTOGGG AGCAGAGGTA 
36051 AACTTGCTGC CTAACTGTOC AAAGTAAGTQ ATAAAACAAG GCTTTAGTCA 
36101 TAGAAAAATA CAGTAA6TXA TCAGGGCAGC GGTTCAGGTA CAAGGATCXA 
36151 AGACAOGAAT ACAGTGATTO TAATTGGGGC AGATGGTQAG GOGCCTAQTC 
36201 T QATA CAACA QAAQTGCAAG CACCACCAAC ACCTOGTCTT TCTCCATAAG 
36251 TCTTTCTCTC CAGAGCCCTC ATGACCTAAT CACCTCTTCT TAAGTCCCAT 
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36301 CTCTCAACAC lATTGTMTO GAGATTAAGT TTCCXXAACC TATC3AACTCT 
36351 TGGGCTCftCA TTCAAACCAT AGCACCACCC AGCACAAAAG CACAGAGCTT 
36401 CCAATCTGGT TTCTASCTCC AXACCCTAGA ACCAAACAGT AASAATCACC 
36451 TCTGOAAATO TAGCAATAAT ATAATCM*AA TTTTTAAAAT CCAGTGGAAG 
36501 GATTGGAAGff^ TAAAATCAAG 6AAATCTCTC AGAAAGAACA ACAACAACAA 
36551 AAAAGACACA GAGOAGAAAA ATAATCAGAA AAATTAAGAA AACTAGAGGA 
36601 TAA6CTCAG0 AGATCCAACA CCAAATGAAT AGGA6CTCTG AAAACATAAA 
36651 ACQCX3AGT6T ACAATATAAA AAAAAATAAA GAATGCTCCT AGTTCTGAAG 
36701 CTTACATGCA TCCTATTSAA GAAAAGGTCC AASTAGTGCT GGGCACAATA 
36751 AATQAAGTAC TTCTTTCCAA 6ACATACCAT CATAAAG6GT CAGAAGCCAG 
36B01 GGATAAGGA6 AACAATtTTTA AAACTTT6AA GGAAGAACCA TCA6AACTAC 
36851 ATAOAACTCC TCAACAOTAA CTCTAGAAGG TAGACGATGG TGGAAAACAC 
36901 ATTCAAATTT CAAAGOGAAG ATTATTTCAA CCTAGATTCC TACCCATGCT 
36951 AACTAAATAT CAACTGTGAG 6GTGGAATTA AGAAGTTTAG ACAAGCAATG 
37001 ACTGAAAAAA ATGTACTTCT QATACCCTAC TTCTTAGGAA ACTACXTGAG 
37051 AG6GTACCTC AGCAAAATGA GGGAATAAAT CAAGAAAGT6 GAAGACX3TAA 
37101 GACCTGAAAC TGrTAGTCCA ACACTAAAGA GTOGTATCAG ATAATCCCAA 
37151 CACCATAQCT CTGCAOCAGG CTTAAAGTAA CCAGCTCGAA TTTGAGCAGA 
37201 AGTAAGAAAA GATTGTGTGT ATGTGTATGT GTATGTGTGT ATGTCTGTGT 
37251 GTGTGTGTGT GTGTGTTGAT ATGGTGGAAC AGCTTCAGAG GAA6T7AAAG 
37301 AACTAACAA6 CTATCTOATG TCCTTGAACA TTAGTAAACA TTATTGTGAO 
37351 GTGTTOGTAG ATCTTTTOGA GCATTCAGCA TTTACCAGGT ACATAGAAAA 
37401 CTATCCACAT 6AAAAAAAGA GTTGTGTTAT TAATTCTAG6 AAA6CAAAAA 
37451 AA6ATTTCT6 TAATCCAAAT ATGTTACTTG ACTCTTCAAT TAAXAAAATT 
37501 TAGACACTGG TACTAAAT6T AGGCTGTTAA TTTAACCAAA AATAGAGATG 
37551 CTATAATQXA AAGATGTGGT 6TGGAAAAGT T6CAAAGAAG TTGTAAAACA 
37601 ACTAAATCCX; TAACTACXSTA AGAGAAAATA AATATTTACT GTCTAAACCT 
37651 AGAAGCTGXA ATTTQAGCAT ATTATCTAGT GATAAGGAGT TAGATACTAT 
37701 AAGAAATCAT TAAACAAGCA TGAAGTGGCT ACCTCTTQGA GAACAGCTTG 
37751 OGTGAGGXAA CATGG6ACAT AACTGCTTTT CAAGCCTCTT CATGTTTTTT 
37801 CGTTTTTGCC TTTTTXAACT AAGTGCTGTT TACTCTAACA AAATAAATTT 
37B51 TATTTTTIAA ATGTGAAAGT TGAACCTTAA GGCTCTTT6T AAIATTAAAA 
37901 TCCSCTGTCTC AATTAATTAT TCTXTTGTTQA TAGTCTATAC ATGTACTGTC 
37951 XAOTAACAAA ATATGTGATT CATCAAAArrA TCTTAAATAA TGAGCTTTAT 
38001 GTTTA6CTAA TTTTCTTTCT TTTTTCTTAT GTTTTTATTT TTAGGGTGAT 
38051 TCT6GGGQAC CTTTAGTCAC AAGGGATCTG AAAGATACGT GGTATCTCAT 
38101 TGGAATTGTA AGCTGGGGAG ATAACTGTGG TC31AAAGGAC AAGCCTGGAG 
38151 TCTACACACA AGTGACTTAT TACCX3AAACT GGATTGCTTC AAAAACAGGC 
38201 ATCTAATTCA GQATAAAAGT TAAACAAAGA AAGCTTGTATG CAGGTCATAT 
38251 ATGCAIGAQA ATTCAACTAT TTAGTGGGTG TA6TACAACA AAGTGATATT 
38301 AAATTACTGG AXCXAGTAAC ATGAAACACA CAACGTAAGT XATTTAGAAT 
38351 CACTTTAATC AACCAAXAAT CCTTAGCCAA TTTATAAGGG ACTTTTATTT 
38401 GXAAAGTAAT GGATCTGGCT TOAAAAATAC GGTAGAGATA CTTAGCTCTT 
38451 TAAATCACQA ATCTTQAAGT ACCAGTGAGA CTCAATACAT ATTTTTGAAG 
38501 ATAGTCCATG OGATTTTTA6 AATQTCGTTG TCAAQGGTCT CCTTTTAACT 
38551 QAGAAACTTT TTGAACTCAC AAAGTGTTCA AGAAACCCTT GZATAATTCC 
3860X CTACATTTCT CTGGAGCTCA CAAAXACTTT ' rm ' JL ' Crm ' TCCTIATTCA 
38651 ATCAGATTTT CCAAAOTACC TTTCCACCAT AAGAAAT6AA TTTTCXACTT 
38701 CIACACCCAT TTGAGAGACA CCAATAAAAG AAAGTCATAT GTAGGAAACA 
38751 AAGTCTOATA GTAAAACAAG CCAGAGATCT TCTAACTTTT TTTAGTTAXA 
38801 AAACCTCTAA TTTTTGGTQA CTTTTCTACA CACACACACA CATA (SBQ ID NO:3) 

FBATDRBSz 

Start: 3000 

Sxon: 3000-3093 

Intxon: 3094-4905 

Bxoin: 4906-5024 

Introa: 5025-17485 

Bxcm: 17486-17553 

Xntroa: 17554-19507 

Rxoni 1950B-19668 

Intron: .19669-25382 

Bxozi: 25383-25421 

Intron: 25422-26622 

Bxon: 26623-26794 

Intron: 26795-30319 
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Beyond ORF(3 * } 



Context: 



DMA 

Position 

72 TTATATTCATAAAAGTAGGCAGTAAGTTGAAGArTTAT^ 
AGCTTTAACCT 
CA,6] 

TGOCTTCTGTAGCTTTTGTAATCTGGCAGTGCGCATC^^ 

TCMAABGAGAAACACTCTAACaACTTATCACCCT^^ 

AGATGCTCACAGCTTCirrCX3GrrGGGATTTGAAG^ 

ATGTCAATGGGTATTGAACCACTCTTCAGCTCTGArCCCA 

TGACXATGTGTCTTGGTGGTGGGAGAaxSTGATTCl^^ 

1894 ACCTGGGCCCTTAAACAGATATCCTCTCTCTCArCCT 
GTATCMTCCTGCCTXSACrCTCM^ 

TTGGGTCTTAGCGGTAAAAAGA.TGAACAAGGCTAATGCAGC^ 
AACSTGAACATACATGGAAACTAATACXTGATTCAATGT^^ 
AQGTGCC3W5AAGAACAGCAAAGAGTTATTTTTTCCTX^^ 
tC,T} 

CCOOGTGTGATGCAATAT?U^AATACACAGCACX3lCCriTTC^^ 

TTTAATCAAJUWTCTAATCAAGACTTCAGAGCTAAA^^ 

TAGGAAATGAGGGATA1!AAAAGA/ICIU\GTTAAATAATACC^^ 

GTCCAQAAASTAAGAIATTCTAAAGGATGTTT^ 

TTAAAAACTAAAAAAG?lACX:AGGACTCTTTTAGAr^^ 

1897 TCGOCXXnrrAAACMATATCCTCTC^^ 

TCATTCCTGCCTGMnCTCATAGATrTflTATQArrCXr^^ 

GGTCTTAGCGGTAAAAAG^TGAACAASGCIAATGCAGCC^^ 

TQTkACATACATGCAAACTAACACTTGATTCAAI^ 

TQCCAQAAQAACMCAAAaASTTATTTTTTCCTCC^^ 

CC,T1 

GGTGTQAIGCAATimkAAAIACAatfXacC^ 

AACCAAAATCTAATCAAQACrTaVQIUSCTAAAGAA^^ 

QAAATGAGGGATATAAAAGAACAAGTTAAATAATACCACA^ 

C3W3AAAaTAAGA3ATTCT3UU^GaATGTTTASCTT^ 

AAAAClAAAAAASIU^GCAtJGACTCTTOTAGA 

2123 TTGCTOATCATAGQTGCOUIAAGAAC^ 

AAAAimTTArCCCOGGrrGTGATOCAATAT^^ 
TTGCCAAAraAATTTAACGAAAATCTAATCWl^ 
TIUmXSU^TTTAXAGGAAATGAGQGAmTAAAAGAACAASn 
GCMTCAGACAAGTCCAGAAAGTAAGATAT^ 
fT,Cl 

AQTCAATQTCATTAAAAACTAAAAAAGAAtKMaACrrc^^ 

AftgqCftmaCAAACAACnXKACnXXATGGTC^^ 

TGTGTAArTATAAiaAAACCATGGAGGGAAC^^ 

GGGAOAAMArCRrrAATTTTTTAGGAGTGTO^ 

CCTAArTGTCTATAATAATGATTTQGTAAAAAGTCyi^^ 
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2124 TGCTGATCMAGGTGCCAGAAGAACAGCAAAiSAGTO 

AAATTTTTATCCCCGGTGTGRTGCAATATAAAATACAC^^ 
TGCOUUITGAATTTAACCAAAATCTAATCAAGACT^ 

AATCCAATTTATAGGAAATGAGGGATATAAAAGAACyU^GTTAAATAATACCACW 
CATTCAGAOUVGTCCAGAAAGTAACaATATTCrrAfi^ 

IG,A) 

GTCAATGTCATTAAAT^CTAAAAAAGAAGCAGGACTCmrrAG^ 

AQGCATAACAAACMGOVKauntSCATGGTCCrr^^ 
GTGTAATTATAATOAAACaiTOGAGOaAACTTQAAC^^ 
GCaGAAATATCATTAATTTTTTAEXSJUSTGTTA^^ 
CTAATTOTCTATAATAATGATTTGGTAAAAAGTCACGATGT^ 

2648 GTTATGTTGGATATATCCTAATTGTCTATAATAAr^^ 

TATTTCACATTAAAATATAGCaWSCAGaUAAAATAAATGAGC^^ 
AACAATTGAIATAATAATGTGATATATATATGGATGTT^ 
TTTTTATGTCTGAACATTTTCATAATACTTAAAAATAAAA£^ 
GAGATAATAGATTTAAAATCACTTTCrrAAACrrCTAAAAGGATA^ 

fA,C] 

CMAGTGCTCGAGAAAGGAESGAArGGTCCCTTTTCAAGC^ 

GCTGCTAAGAGAAACCATTCCTGACCACCACAAAGAGGCCACCAAA^^ 

AAAGCAGGAGCAACATTAGQATTCCCAGATCCTGJ^ 

AGACCAAGATGACATTQAACAAAATTAAAaACCTTrrTQO^^ 

AACITOAACTTGTCTAAGCaftGAGCTOGAAAAC^^ 

2805 TCAATTATACTA3TCrrTAGT3VATTTTTl^^ 

aaagataaaagataaaaaxaaatgagataatagatztaa;^^ 

AGGATAGACAGATAAAAGAGATAACAAA£TrGClt3GAC3^^ 
AtXMGrATGCCACCTTGGACCarOCTG<TI^^ 

GGCCACCAAATGCCTCTAAAATAGAAAGCAGGAGCAACATTAGGA^ 

IA,T1 

rriVrrri'lTA ACACAICTTCTCAGACC^ 

TGCAGGGAAAQGrCAOGCTACAGCAACrTGAACT^ 

AOCATTGCrCATCTOAGtfUSrAACX^^ 

CACCOGAAGCAGtAAATGCTGAAGCXamSGATGAlTlt^^ 

Catfm3GTGGCaMrK5ACCATAGGTCrrCC^ 

4036 TTCTGGGGAGAATGCRAGCCATTIACATTTTTTC^^ 

ATGGATOTATGTOATAAAACAAATAACTCSWSGCTOCTC^ 
TCACCrrTC3^CAC3A£n'CAA!rGGGGC^^ 
ATGATAGTCCT/^CCTTAmTOATAACCCCAGC^^ 
TCTCItfsGTCACTTTATTTGGTTGCAT?^^ 
[A, 6] 

CZUSTATGOATTATATOOCnAAaTAATCAGGATS^^ 

ATTTCCCACTTAABACATA3TKXrrTCCTO^ 

TOGAAACCC^CTCCCTCTTCCXTOAGra^ 

TTTCCCATTCTCPATCTTTAACTCTG^^ 

GATXACaUWmrCCTTAGQAGTCTCAACTGCl^ 

5056 GATATAAAACArrAACTGTXATTTTTTAAAlS^^ 
A37OTCAAGATTTATATrQGCCCCATTG™OT 
TTAGTTTCCTATTTTTCATTTCCATTGC^^ 
TTAAAATTTTAGATCXSWCAAArrCAATAM 

AGGACTTACQAQAQACGACXraAAAATTTGQTGAGTCAGGTAA^ 
l-,A] 

TAATGCAAGTGGAAGGGATTTTGTGGATCATTTCrc^ 

ACCGCCAACATTAGAATCSmrrTGCSUlATT^^ 

GASTAraATQABAIGGCnASGTGGGGAGAGGAGAGT^^ 

TOGOTOAlTCTAATAAtKXrrcrrCTTTC^^ 

TGAGGravAGArATCCmGCXXXnTTCTTCC^^ 

5445 TTGCTAGGCCCCATCOiZAaACXrrGCTTM^ 

AGGAOAQTAAGGQAATCTGCATGTCTAACAAATGGGTGATTCT 

AACrCAGCrCACCTTATTEAAAGGTAAOAGAATTGA^^ 

TCCyXlPATTCCACCACGVl^ 

TAAGTCCACACAL"riXJX*ITGTAAQACC»CATTTO 
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gttoltcttgtaagtatartcsataaagwsuvaaaacc^ 
tcaaatgctaataattttcnrraaagctaaagtgcaagac^^ 
cacactcagttgtataatcattccactcagaatgcc^ 
gttcaaatt/u^tttttctaacaaggaagcacag;^ 



5608 TATCCTAGCCCGTTTCTTCCCCAATTCCACXZAGG^^ 
ACCAAAACXAGTrPTTTATAAOTCCACACACrm 
ACTATTTTCAGAATTTACXJrTCMCTTGT^ 
TATTTT(3I3U7rAATCAACn*CAAATGCTAA3:AATTa^^ 
TCCX3UUU^AGAAAAAAAGCACACrcAGTTCnA^ 

[T,C] 

TCTCACTC!AAAAACTAGGTTCAA7irTAATTTTTCTAA(^^ 
TTATTTTAAAAAOAAAGAAATGACAAATGTATTGGTTT^ 

TAAGgvcAcriUirivixJcauvATCArgrACxai^^^ 

CATAGTATACCTAATGGCATCMATTTACAATAATATTGTAGA^ 
TCftGTTAACATTAAATCATTaVCAA a rir r 

6243 lTC'Xn*i'CAC3m3CaVGAGCATCTrATAAMlGRG 

ASGAIt3AAOC3QGGAgCCTGCACCRATAC!ACXX3UUW^^ 
AC?TX3ACTCCAC3lTAACXrTCCTOGATGCMAA^^^ 
AAA6ATAAACACACCTTTGAATGA!IX3GAAAAT^^ 
TTGTTTCATTTATATTTTATGGCCAACATTACTX3CT 
CG,AJ 

AGAAATGCAGAGGTGCAIGTTGAACAClAAACTCTATr^^ 
CCn!AAGCATGTGTTCCTTCAAAGGCTAAC36C^^ 
GGGPlOCTCCAftGGCCCrTCTCiW r XV^ 
QCftJgUWSCCCTTACXXrrCXXX XJC ^ ^ 

6273 AGCarrT6CaATC3W3TTCTTAAGTTAT^^ 
CXMATACXrTTCTCTACTCXnrCX3^ 

AAGAGAAAACTCTTAACITGCCrTAGTTAAAAAGATAAAC^^ 

ATCnTACAATTTACTGGGAAATTTTGAAATTTGTTl^ 

ACTGC13 Unxm X? iTClWi 'AAGTTAACTRGGCI ^ 

AAGAA3XK:AATAC3GTCnTAAAAGAAmX3AGAGAAAT^^ 
CTCmTTTftAAAtnGGAGTTTTAAGTTTC^^ 

CTAAGTTAAGTAAGGACACATTATCATCATCGGrrACCntKAAGGCCC^^ 

ATTATTTATTTAXCCTCCTTTMCACC^^ 

AAATCATTglMGTrrKMGrrC^ 

6294 AgrTATGCPWSgftTGRAOGGGGAGCCTGCA^^ 
CC3UJItXri!fUtf3TGRCTCXavCATftAC C TCC^^ 

TTT!raAAJOTPTGTTTCOTTTATATTTI3iTGGCX:A^ 

aottaactsuqgcaattctgtctttactg;^^ 

AGAA6TOIU3AGAAATOCAGA6GTOCATGTTt^^ 
TAAGTTTCACCrAABCATGTOTTCCTTCA^^ 
TArCATCATGQGTACCTCX3UW3GCOtriTC*l'Ci^^ 
ATC3^CCATRGCATR?UKXCTTACCCTCCCCCCTTC 

GGTA rivinn ' iXJ ' jriwrA TTom'crTACAAMv^^ 

6312 GGOC3AC3CCTGCAOC3UITACACCCAAATACCTTCT 
ACATAACCrrCCTC3QATGC3UUUUU3AaAAAACTCrn 
CAOlCgrTTGAATGATGGBWATQTTACA^TTT^ 
TTATATTTTATGGCXaACaTTACTCCTAC^ 
GTCTTTACTQAAGTAAACGGACAAGAATGCAATAGGTCn^ 
tA,Gl 

OAGGTGCATGTTGUUlCAQAAACTCTA^rTTAAAAGTGGAGT^ 

GTGTTCCTTC3UUW0GCXIUU3GC1AAGTTAAQTAA 

CMGGCCXmCTCTOGTOXn'CATTATTTATl^^ 

CTTACCCTCCCCCXTrTGCAGGAAATCATTCTATt^^ 

TTCAl'AVl"lAC3UUlAAlA: i\JITi " i ' I tXJAT1^^ 
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6506 CAACATTACnXSCTAgrU U 'l tJ ' iWX 'ltJ' r AAGTTAACT^ 

AACJU3AAACTCTATTTAAAAgTGGJ UJri " lT AAgl^ 

GGCTAAGGCTAAGTTAAGTAAGGACACATTATCATCATGGGTACC^ 

TGGTTGTCATTATTTATTTATCCTCCTTTATCACCATAGC^ 

TTGCAGGRAATCATTCT AltariU XaiTGTGGTATT Cni ' ^^ ^ 
AArAlXi'ri"l"rGCTAlTlTl"ltJCGTACALTlTl\iC"l^^^ 

ri ' rixji ' ri 'CATCT Cuvrri ' A CTGAGAA crn ^ ^ 

TTACnTTATTGCTGTTAGCTGCTAAlTCATAGTGTGT^^ 
CATXX:CAAGAAATGCCACACTAAACAGACrrCCTACT^ 

6 714 TTATCMCATGGGTACCltK!AAGGCCCTTCr^ 
TAlXI^WrCATAGCATAAGCCCTTACCCTCCC^ 
TGGTATTCTTTTGTTTGTATTCSOTCTXAC^^ 
TGCTXTTAACTTACATTTTOTGTTAT^ 
CTTTTTAAAAGATATATCrrTACTMATArAC^ 

tc,G3 

ATAGTGTGTATCTTCCATATTTACXrrGCCTGTCaT^ 

CTCCTACTTACCCCCTTATAGACCTAIGCAAGTACT^ 

TTSftATQTACATATACTTAACTTGaCCg ^ ' lXjtfXXj ^^ 

CKMTCTGCACX3CCCATCTAa^^ 

ACTTAGTGTCTTAGTATCnTT3UX3ClACTAC^ 

68X5 AATCATTCTATCnTTCAUtJlXjtnAri^^^ 
TIXKrTA l ' ri " l X j C Gl ' A CI UJU. " l^C ' rX ' ^ 

CAT C T crrrri ACixaAGAA CTOiri ' r AAA;^ . 

TTGCTtnTOGCTGCTAATTCATAGTCTCTATCTTC^ 

GftAATGCCACACTAAACAGACTOCri3U:TTACO^ 

rG,C] 

AAtX^AGAATTACTAGQTCSWrTGAATGTACATATAC^^ 

TGCTCnrrCAAAATGGCTGJ^CTCSUSTGTGOWCXX^ 

CmXXXACATCTaUVCCAACACTTACjnsr^^ 

ACCATAGGCnXSGGTATCnrrAAACAACAAACAATTA^ 

ATTa^AAGATGM^GATGATCAAGGCTCTAGCAGAr^^ 

6994 ATTGCTGTTAGCTQCTAATTGATACntSTOTATCr^^ 
AGAAATGCCACACTAAACMACrCXrrACTTACCXX^ 
GGRAGCaGAATTACTAGGTCATTGAAltnACATAT^^ 
TTTGCPCTTCAAAATGGCTGACnxaWSTGT^^ 
ATCTCCCCACATCTAACCAACACTTAGltTIXriT 

[A,G] 

T!ftOCMa«3QCTGGQTATCT13UU^CAACAI^^ 

GATTCCIUUaATOAASATGATCftftGGCICTAOCAGAT^ 

TTCAJAGftATftCCMVritA^^^ 

GIAAGGACAC13UVTGACTTTCAIGA6IU^CTC 

<XXXXATCTCCTCTATCJWTOGGTTTGGSA^ 

12478 TTCTCATTTCCCTXnATCAGTTTXra 

AGAAAQAIGAAGaTUVGCTGATGGAIAISmXSI^^ 
TCZAXTUnTGGAAGAAAGGTGTGCSATGGGTATGCTTTT^ 
AAGTAA33mSAACTATTTCa3lAftTTTCX^ 
AGGAQAAlCTAUTlTAGTTTATCaTCATCATT^^ 
tT,Cl 

AAAAOGCISU^QaUUUUlTGATTCTCTCTC^ 
AOAOUkCAAATCTOGAaAGGAGAimACCn^ 

Aix;c"iuorriu'AAATAU"rrriyujTA^ 

T TTTAC MUUtfKg^ATTCAAAQATCTAG UlV^ 
TCTTTTAAAATCXOTAAaTOCAAATCTITOAATTC^^ 

13493 QATCCAAAAAAAAAAAAAAAAC3lCCTAaAGTTTTATACAaATATG^ 
GACTTGCACrAAAAACTACCAAaATTATGATTCTTA 
TGCCTTTOGAOACSQGaTCCAACAGTTTCTGAT^ 
GTGGCmUKlAGGTCraAGTGMAACCTACCTGCATGCr^^ 
AGGCGTTAACAGGCTCTGAAGCTACATGGCCX^ 
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IT,G,A,C] 

TQGGCAAGTCACTTCCrrCrmnMXSAAACG^ 

TGIOTTOAAAGC3U\AroA6CTCAAA^^ 

ACAACAGTGATTCCaWTrATTATAATTATTACJ^CTO^ 

ATAATCAATTACCTAAAATGTtX^AAAACAGGAAAAAAAAATCTCTTCC^ 

TOTAA U " XVrL " I " i * XTri ' C ' TCTA GGAGCATTOATCrrCAACCTGATC 

X3522 GTTTTATACAGATATGATACGAACTTAAAAGGACroCACTAAA;^ 

TTCTTATTTTTGGAG3U3TAAAGAAAATAGGCTGCCTTTGGAGAG^^ 
QATCCTCTTACTiAACTGCTTGCTGCCXyVTCAGTGGGTAGGA^ 
CTGCATGCTCATCCTGAGGTAGGCACTGTGAAGGCGl^ 
CCCTGGTTTCAGTGAAC'l'C'lXjlXXrrGTCAACTTGC^^ 
ICCA.TI 

GTGAATAATCATAGTACTCyVCCTTAGAGGGCTGATTTGAAAGCAAAT^ 
ATGACamrTGTGCITCGTGCATATATGGCAGACAAa^^ 

tacagtcttaccaaggaggagctttccacaaataatcsuit™ 
aggaaaaaaaaatcrcttccgataattcatgtgtaattttc^^ 

GATCTCakACCTGATGTAAAGCAAGCACTTTAAAAA 

13916 AACAGTGATTCCX!ACrATTATAATTATTACAGTCT^ 

AAraVATTACCTAAAATGTCCAAAAACAGGAAAAAAAA 
TAATTTTCTTTTTTCTCTAGGAQCATTGATCTCAACCTC^^ 
AAGTCTTATAAAATTTTCCTGGTAAATGCAAAACTTTC^^ 
TTATGAATrTGTTAJVTTCAACAAAAATATACXACATA^ 

[T,C] 

TAGATTTTATAGACTATGAAAAGATAAATTGCCATCTCT^^ 

TAATAAAAGAGACTATATATTTGCATAAATATATAGTGA^^ 

TATATGTTlACArTAAAGAA!I7U\AAGGTATAAGAGG6ATJ^^ 

AAaAaW30TC3WtnTTGAGATTAAC3GAAXATCCOCau^ 

TTGAAGGATA GTTO l^TTCAGGAACACAGAAC^ 

13974 AXAATCAATTACKTOAAATGTCCAAAAACAGGAAAAAAA^^ 

TGTAATTTTCTTTTrrCTCTAGGAGCATTQATCTCAACCTGATCT 
AAAAOTCTTATAAAATTlTCCTGGXAAATGC»AAACrTT^^ 
TTTTATCAATTTGTTAATTCAACAAAAATATACTACATACXZAACAC^^ 
TGCTAGATTTTATAGACTATGAAAAGATAAATTGCCATCI^ 
[A,G] 

TTTAATAAAAGAGACTATATATTTCCATAAATATAT^^ 

AATATArrGTTTACATTAAAaAATAAAAGGTATAAGAGGGATAAGA;^^ 

Gai^AGACAGGTCAGTrTQAGATTAAOGAATATCCCCAAA^ 

CCTTGAAGQATAta riVI t i ATTCaQGAACACa^^^ 

ACCAAAGGAACAGCCTGAGAGGCGTGAGTATGCAGGhAAArrGAC^^ 

15081 AATQGCTQQGCQAGTCTGTTTGTTTGAGTTGACIUX^ 
ATCCAACIMCCTTCAATTOOCXZTCTTQGAACTTAATC^ 
AAAATTATCAAGCAGAAAGAOATACITICXZCTGAAAGA^ 
GACAAACTCCAACTAC3UUUVTTC17U3AAAI1X3CCCT^ 
AAATTGCTAAIGCTATTAGGTTGmXAGATM 

rG,A3 

CrrTTAAATATATAAGTTTCTCTGAAACTTCTGOGAACTTC^^ 

AAAGAATOCTTCniAATTATGAAAOCCATCATCTGCC^^ 

GAAAGCTAGTTTATACATAAGCTCCMTCTAOUV 

ATTTTCCTCXrrGCnXSTAAimXaTTTTATCAa 

ITTlHJlXI RAAGO CaTimXXri 'CAGACrACCl^ 

15907 AAACTTGATCX3U\TGCXn7a\CCAAAAAOT^^ ' 

ACTTATTCACSAGOiGTAATTACTUU^^ 
TriXAJI ' l tnaUATTGTATCACTTCTCTCCC^^ 
GQAATCCCTCACenXZATACTGAGTAGl^^ 
TTATAACAAAGTCACXXnTTCAAAAACATGTCTTCC^^ 

[A,G) 

AACCCXACACX3UX7rCAGCTAAATGGGGCTTTCT^ 

TTTGGATACTAOClAATTTATTTTCCAAATGCTATCn^ 

CCaAATCTATATCrCTACAAGTTTTATACTTTA^ 

ACTATG TOl ' I CTACAAAAQAAACCXSaUtfyrAAAATTT^^ 

TGTaATTQA6TGGGAAaAGGCGGA(XXn*ACAGATAGAAGACT^ 
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17884 



17908 



20551 



21222 



21232 



21353 



AAGACTGGAGGGAAAAGGAACJU^AGGAGACAGGGACTCTCATOT 
GAOTAGGCTTTTGGCTAGtfArTTrrCATAAACATT 
GGGCTGACCACCGTTTTGTCAAQ^AAAAGACTAAGAl^ 
CAAAGTTCACCAACTGACAGTTTCCCAAAGT^ 
TTATTOTGAGGCCTGGAACCTACCAGAACCCSVTO^ 
tG,A] 

TTGCATGCACCAA£3TTMATTATCrrTGACAATTM 

AAACTTGGCTATAAAATGGGTTCACATATTTTACC^^ 

CATGCCrAAACAAAAAGATATTCCTGTTGTAATAAATTTTC^^ 

GOIU^GACrCMATCRGTTGCIUSATAT^ 

CTACMAGCMAAOVCQCATGTCATATACAC^ 

GGAGACAGGGACTCTCATOrATTOrATOTCrCC^^ 
TCATAAACATTACCTnrTAAAt3CAGTCriX3AACT 

AAAASTVCTAAGATTCAGaAAGGCn'AAGAAATATGTTCAAAGTTCACC^^ 
CCAAAGTGAOVQAAOCAGGAATCAAACCCa^TT^^ 
AGAACCCATGACGTOGGGAAAACXXIASCAGCr^^ 
rG,T) 

TTGACAATTATATTAITTCATUIXIACG^ 

GAAATTTTACCTQTAATGTAAtXaAATGACATAAGGCAT^ 

TGTTOTAArAAATTTTCnrrTCTOT^^ 

attgctcagaagtttcaattgtgttattttgaa;^ 

TAT7VC3lCAAArCCArGASCCTGIArGACT^ 

ATTATACC3OTATC»CTTCCCTCAArTAAGGAGA^ 

ATGGCCTTTACCTTAAOTAACTAATTTCTTT^^ 

CTITATAGAACTRSAAATTCSUZAC^ 

AGAATCAAATTTAAATCTCTAATrTCTTAAAAt^^ 

CAAAGACATCTTAAAAATTCTGGCTTTGATACTOT 

IT, CO) 

TCATATTAAIUUmtn!IUU36C»GCC3^^ 

Gcw iuu x j CAca^saa' crri^j ' iU 'CxaAc^ 

GGAXACCAGACCrrAAGTGTTACyiGA/^GAG^ 

AACAGGAC^-CrrCTGCCCrrrGTAAGGGTCT^^ 

AACnAAAAAArarGaGGTAATGGAATGGQCAGATGl^^ 

T LVriXjnriA GATGGQGaACSaU^ 
TTACATGATGCTTTCTGTGTAATTTCXZAATAAA^ 
CTGAACTATCTGAAACCAGAGTARAGC^^ 
TTTTTGTAGGCTCAGCTTCTAAACTTCAGCTTA'^^ 
TAGGATAIQCnUlTGGAGAACXTTaArTTG^ 
[G,T,AI 

GCXrrftTAATTCCACK3 UriTlX 3GQft^ 

AAGACCAGCCIGGCXIIUVXATGCJTGAAACC^^ 

AGGCCTGGTGAOGGGCACCTGIiUITCCCAGCrD^ 

TTGAACCX36GGAGGOC3GAGCnTGCIUSTGASCa^^ 

GCTTOACASAGCAAaACTCXATCTC^^ 

ATQGGGAMSCAAATGAAXT^GAAgTTGTGAT^^ 
CUU 'lV iVi tf i ' A AITTCCARTiAArACmr ^ 
TGIUUtfX!AaACn!RAAGCATAAftTTGTTCA^^ 
CTa«KrrTOTAAACTTCAGCOTAa^^ 
TAATGGAGAACCTOATTTaAOACmiACCrraA^^ 
[G,A,T3 

CGAGCACTTTGOaAGGCOGAGGCGGGTOGATCACCnXI^^ 

AOGGGCACXTGTAATCXXAGCTACTTGGaAGACnXS^^ 

GMSGOGGAGGTTGCAGIQAGCCAAGATCXSC:^^ 

GChAGACrrCCA3X7nX3^AAAA2A^ 

GRAAOOWa?tf3TMtf«K3VTRRA VXm ' r CAT^ 

TCAGCTTCn!AAACTTCAGCTTATTTTi^^ 

AATGOASIUVCCnXSATTTGAGAGTCACCTGAOC^^ 

CCAGCACTTTGGGAGGCOGAGGOGGGTGGIITCAC^^ 

TGGCCAATAT66TGAAACCXX30TCn*CTTCTAAAAATAau^^ 
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rC,T,A) 



21904 



22132 



22369 



22742 



22862 



cgggcacctgtaatccca^ctacttgggagactca^ 
aggcggaggttgcagtqagccaagatcxkxxx:actg^ 

CAAGACrCCATCTCCAAAAAAATAAAAAATAAAAGAGTTACCTGACX^^ 

CTAAGTOVCCACSUSGACCACCCAAATAATTGGCTCATGCCTTT^ 

TGTAAAATTCXZAATGGTAATGTTTGTTCITCX^ 

CAATGCnrAATGTTTGTTCTTCCTGAAATCAC^^ 
AGAAAACACAATGTGAAATAAAQA06CTGTrACTAA3X3A^^ 
ATGCTTTGGAAACCriX3AAATCATTAAT^ 
CTTaaAAGTTTCAGAATGTTCAATCMrAC^^ 
TAGGGGGICTTGCTTTTCTGGCCAGAAACCT 
tC,T,A] 

G"iU TT G crcTGGcccAcnxKy Krrixj ' rxxntj ccxAc^^ 

TCCGCTACCAGCCrrtSGATCCCATGCCCACCAAGGCC^ 
TTGTCTGAGOGAGCACAGGGrCrGGCCACIGrc 
GA0GGGCAGCTCCAGGCAClX3GCACAGGTGTGCa^^ 
AAAGCrCACTGCAAGCAGCTTCCCTGGCAGGCACCT^ 

GATATGGGAGCXri3U3QGGGTG'l"lWrL"^^^ 

TGc x- ' irri ^ccAA ij ' rrritj CTCTGGCCc?^^^ 
ACTGTGCxxa^ccrTCCGcnaiccwcxxn^^ 

GGAGCTQTGAGGQTTQTCTGAGCOAGCACACSGCT^^ 
CTGGCTGCABC3m3ACC5GGC3U3CTCC3tfK3^^ 
ET,C,G) 

CTGTGGCTGGACAAAGCrCACTGCSJUQ^^ 

GCACCCAGGAAGCTTGGAGATGCCiyCSGAACrGGAl^^ 

TOGCTTGGGGAGCTCCCAGGTCTQGQZmXXnaUi^^ 

CCCACAATGTGGCCAGCAAGGGGTATGTTTCA!^^ 

CTTGCTATTTGGCAGGTCOTGAGTTCTTGT^ 

ACACrGGCTGOUSCATQACGGGCAGC^^ 

GAGGCTGT6GCTGGAC3UU\GCTCACTGCAASCAGCT^^ 

GGTGGCACCX:AGC3AAGCTTGGAC3Aa'GCCAGG^^ 

AO^CTGGCITCGGaAGCTCCCaUXmr^ 

TTTACCCACAATGTCGCCaGCAAGGGGTATGTTTC 

[T,A,G,C] 

AGT CnritX rrATTTGGCftGQTCCTGA UlUVrr^ 

AGACAAGTGGAGGGTGAGCAAGACXSAAGAAAGGTTTACrG^ 

GAGACCCACAGTGGGCAGCTCCTCnTCAT^ 

TAQOUUUSACSGABGCCCrrGGAGGTAGAASCT^^ 

TGTrcAGCTTTCAGCACACTlGTAGGCAGTAGG 

OGTOIUKAAGAaSAAGAAAGGrTTACTQmSCAAGA^^ 
GGGCAGCTCCrCITCAXAQCXAGGCTGTCCCAAC^ 



AGCACACSkSTASGCAOTAGGCXXTEAGAST^^ 

CATGGTcrcccAGrrcAcxntntxskxcT^^ 

CXX3UXX!CTCCGTGCCTQ3«X3UUKOTC^ 
ATTGTGGTAGCTCCCAGKaGTGQOtfSGC^^ 

TcmrcAccTTcrccccsicxxxxrrc^^ 

CCA6GTTCOQGASCA6QAGAGGCTCCAOGCXnX»3(^^ 
GTTGG6GGTGGCAC3WCmXK3CTGCCTC»GGa^^ 

CTCTCTGCACSGCAOCrrTOTCCTG^^ 

CCCTAGAGTQGTCTATCTCCTCnxrTGa^ 

TCCATCTGCAAGGGTCCAATGCTGCXrrCCakGCAC^^ 

CaUU3CIGCTCCCCCACC3WC?^^ 

TGGCAGGCTCrrGGGGaGCrrCCXAGGaATGGGCTC^ 

CXritXX7r6CIUnx;(KX:ATG(mAAGAA3t3^^ 

G6CTcau3QCxr^GGG^cx3UK?^cxrIGCC^^ 

CrrQCXTTCAGGaATGTGQGACSUrASGGaACC^ 

TCCKKTTACCACrKKrrCCAGAC^^ 

ATTCRCmyatfWCiAGCTCaQGAAAAIXnrrTACGl^ ^ 
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23316 GTGGGACACAGGGGACCCaCCACCATaU?rGCTACTC^ 
GCTCXAGACAOCCTGTACKnXSCXIMXZ&CTAG 

AATTTATGGACTACCAATAAATAGAAAACTGTAGJWSATTCTAGA^ 
TCCTGTi^CCAAGATTTATTTATAATTTGTCMGAA 
[A,-) 

AAAACTGTGTOGTGTGGGTCCTTCAGGAGAiCA^ 

CTTCTTTGCATTGCAAACACCAAGGCTGTAGTC^ 

GACTTTGCTTCArrrTTCArCAinATAC^^ 

CrPTTCTCCXSUlCrrCAGAACOOTQra^ 

TTTATTTCCaTTTCrATACCAQGAAAGTAAAAATCr^^ 



23867 TTTCTAXACCAGGAAAGTAAAAATCITTGGTCAAAATT^^ 

CTTGTGTATTGACAGTTTGTTTCCAGaTGTAAT^ CCTTAAAATCCGGTTATATT 
au:X3ACXaTTATACTTATCCTXy?rATCAri^ 
GaVCXMGTTGACAAAGTTTCAATTGAAGAATTCT^ 
GCATTACAAAGGACAAAATATATAGTTTTCTTAAAAATGAJ^^ 
CA,G,C1 

TACATTTGAaGOTAAACTaACnTCCrn^CGATAGA 
CraAOCAATTGACTCTTCACCATACAATQATT^^ 
TGAATATTTlX!IUUUWCK7FCtX3lC^ 
TTTGl!AaAAAGTTA5TAaAAIGAAACAA7CTTAT^ 

agaatgigtctqacsaaacatggcactggcacsg;^^ 

23 954 TGTAArCATTCTCCCTTAAAATCCX3(SrTAT^^ 
ATTCXrrOGAAAItSGCTAACTTGCATCCKKr^^ 
GAATTCIAACTTTATGCTATTTTCCACTTTATTGC^^ 
TTCTTAAAAA!It3AAA£AAATTTACTGCX:TTAAAC^^ 
TCCATABAaaAACXAClAAauSCAAlX^^ 
lA,0] 

TGATTTGGGATG<XrrPTAAGGC7IATATTT<^^ 

TAGAOTTIATCATCSWCXAGTTTrCCCCAGTOGAAT^^ 

ATCTTATTTTGTATIUmSAGGAATAGAAXA^ 

GTAGGAAAAAGTAAACAGTTTArTCTCArCTGCTGAATAAOC^^ 

AAArcATCAAAATTTTCATGAAACCTTCaVCCAACTTTATTTTTCCCC^^ 

2 6 54 8 agtggc3wsaaattagacx::aggaottggtggtacc^ 

CnAAAAITAC3AA3rrCCAAAGTAGAGAAAGATATi^^ 
GTGATTAATGCTATGACAGAGGAAGTGCATAGTGCTATGA^ 
TAACCTOITCTCa^CACAGXAAGAAAGTGAAC^ 
ATCCMTGAatf3aTX3GGQT70tf a rGT^^ 
IG,A3 

GGCAAGTAAQAATOGGCmSirTTCCTGIMTTACI^^ 

TCATCTTTTTTCSUOGTTGTGGTAAAOQAGT^ 

TGGAGTCAITIXKACCX3UU3GOGOCXn!GGCClTIX^^ 

CCATCAOTGTGGGGCX3^CCTTGATTAG13UlCACAT6QC^ 

CGAGAAOIAAGTTATTGACCnTOAGTTAaAACXXaOT 

26573 OGTOGXAOSekTTGTGAAXAAATUSWIG^ 

AAASATAIAAATAAArCAGGAACTrATGAAAAXAATCTGAr^^ 
TOCATAGTGCTATGAaACSTTGATCAaAaAGTCAaCTAAC^ 
GTGAACCCTGAAATGTGAQ2U3AaiUUGAGGCC^^ 
CClt3GO(3U3aAC3GAGTAGTAXACaAAAAltm^ 

{T,A,G,C1 

axnAATTACAAGATQTTTCTTTVTAACTT^ 

OGACTTTGrrcaCTTAAACOTCAACM^ 

TGGCCrTGGOUytXrrTCCCrTCAOTATGM^ 

AGiatfkCACMXSG L ' ritflXIA CTQCRgOia^^ 

TTAGAACXXSkCTTCTGCTAAAAAGCXXTOAGTTT^ 

27400 ™aViYrixn'CftAACAATGClCTCCACTT 

GGGCGAQTCTCATACTGATCTTAAATAATCAAACnS^^ 
CMTAAATGCOOaAAGTTGGTAACCGTQATGAaXSQWM^^ 
ITOACAa^ATGAAOATCTGTGGAATCAGAACAGTTTAC^ 
ATGATAATUSACAGGCSUnrTCAAAAQAGATTCCTCGGAOT^ 
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tA,G,C3 

TTGGGCCACTCAATGTGACCrrC(XATAATAGAGCATCT(^^ 
GAC3UU^GCTGA7WCnt5ftAGJATAGCMATTGTOCT 
CATrri'ATTAAATATATQATTATUVTGAiJ'rrri"!^^ 
AATAACCAGT<3VlUTl'in7VTa^CTATTACATTTAQAATTTTA<^ 

TAAcnCTATCxx: LU " rrc ' iTcrr cATTC 

277BB TTGTOCTATCCTATAATTGTTTCTGAATGCATACi^^ 
CTTTTTATAACTTTTAATCTTACTTTTCAAGATAA 
CATTTAGAATTTTACaATTTGTTTCTAAGTAGATTAAC^^ 
CCAATTATTACAGTAATAACAAAGACnrrCTTGAGTATCTCT 
aATTTAOTGGQAAAAATATGTCCCAGOCAGTTGGAGAGCri^^ 

CG,-] 

TGTATTAGGTAATAGATAGGCTAGATCTTTTCACATTCTTTT^ 

CTTTTGTTACTATAATAAATTTCATTTGCCTAGGA^^ 

ATATTCCAAAGAAIATACTmiTTAAQAIVTCrAGGCTT^ 

TCCC»GCATTTT6GGAC3GCCGAGOCAAGAGGACC^^ 

CTTOGGCAAGATAGTOAAACCCCATTGGGCATGCm^^ 

28069 GGCAAATTATTGAACCTTAGTCnATTAGGTAATAG^^ 
TTTGACCTOnVAAATTCTAACTTTTOTTACT^ 
AATCTTTArAGAGACTCTTAATATTCXAAAGAATATACA^ 
AaX3Cro3GCrCAItKXnxn!AATCCX3VQCA^ 
GCTCAGGASTTCAAGACCAGCTTGGGCAAGATAGTGA^ 
tT,G,A] 

TACCTATCATCXXIAGCTACTTGGGJUSGCTAACXXS^^ 

TGAGGCriTXnXSCAASCTATGAITGCACC^^ 

CCCATCTTftAAAAAATAGrC^^AlSlTTT^ 

GAAAGATGIGAGJU5CTC»GZ?UWCXnX3ATAXATT^^ 

(JlXritJGlTrrrJCAAAGTAAT^ 

29269 CATATATGTGTGTGTSTATATATAAAAAAAAATAl^ 
CCTCATTTTTCrAQMCCAACTTCCMATGC 
ATTCX3AGALriTlWIX3WLriWVirra 
CAGATGTCn^GACTAOTOICACACyiTT^^ 
AATAATYTXftAATTGGTGGGAOACAtrtGTTTTi^ 
CC,Q3 

ATATGTGATtnOIATCaAATAGTTTAACCT^ 

CACAGTCTTTACTAGATGATCTTTCATTC^^ 

ACCATAAIYXZACXXIATTGCCXAAGCCXAC^^ 

GATCMCTCTCftAAGGJ^CTAIXX3tfma^^ 

TTCJ^AGAATCTACTCCCCAGIUUVGAAmAACAI^ 

29537 TTTATAAAGGGlU^aAGCAGAGAaABGCAGGCAGATAT^ 

CCTATCXAGGCTTTATTTTCCrrTAAC3TATAAAACACASTCTn 
TGCTACIAIU^TGATTTTTCOGATTCC^^ 
AC3UWSCI3WSAAGTCAACGGC3lTTTACCACArT^^ 
ATCTAATAGACrrTlSUCXACATCCS^TTCTTG^^ 
[C,A} 

ATTGCGAXATCACTTCSUVrAAAATTTIVAGC^^ 

ATCTACalXXiL'lXJCiriAOCTCTCCAAQCTO^^ 

CTGCAGCCCATCCAACCCAAGACCTTGaaAlTl"!^^ 

CCTa\CAcraACX xri^rirriri 7^ 

29726 AAGTC3UVC0GCATTTACCACATTTGA7CATCT^ 
ACTTTACX3kCATCCATTCTTaACCTTC3U^aA^ 
TTTTRAAAATOTAAATGAGaiCTACATTATT^ 
ATCACTTCAATAAAATTTAASCACTTTATCATaACC^^ 
CXrrGCTTACCTCrKXaUVGCTCAC(XX:CAACX^ ^ 

tT,G,C3 

ATCCAACCr3UlGACCTTGGGAU"riTl"ltJCCTGGAIV^^ 

GACCCTCrrTTACTATGTCTIAGCCCAAAlXSOQTTATCAAA 

AGTACrCTATTCCGTTACXXrrATTTTArrTTOTTCAT^ 

ATTTATCTOTTTGTTTGCrrGCTTTaATCCTT^ 

GAQGAGGGACCrTAGQTCCTGTTCaiTakC^ 
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30496 AACTATAACTGCaW3CCAAC?rATTCTCAGGAl^^ 

ATTAATCTAGCTCATATACTTTGGGCAGCTTATATATATTC^^ 
CCAGGTATAAAAATCXyiCATCARTGGACrGTTAGTTTTGGAACAA;^^ 
TAATGAAAAGAAATGTCyiGAAGATTTATTATCXyWTGAGtf^^ 
AGTACXaAGATTGCTGTTGTGCAGGTCTCTTCCAGAGTCACCT^ 
CC,T,A] 

GATTTGTTTGCCAGaUWSCCTCTGCamCTTCCyA 

atttcsgagcactttactatogtggtoggtatctcacsgatagcta;^^ 

TGTCTAAGGCAATGTQATTTCATCTCCATCAATATTATCC^^ 

TCTGGTTGGATTAGTTAGGGTTCTTACTTTGTGT^ 

GTGCaU3AATAAAAAACAAAGAAAC3UU«UlCTTCCACAAATTT^ 

30695 AAGATTTATTAICXATOAGAAETITICQOCTCTCC^ 
GCAGGTCTCrnXX3^3AGTC3lCCTTTT^ 
CTCTGCATCCTTCCMCCAAATTTGACTGTC^ 
TGGTGGTGGGTATCTCAGGATAGCTAACAGAGCGCri^^ 
TTCATCTCCATCAATATTArCCItaACftGCCAT^ 
[A,C,G3 

GTTCTTACrTTGTGTGACAGAAATTC^ 

GAAACAAAAACTTCCACAAATTTGGCTO^TGTAATTTGGAAG^ 
TTTCACTTCAGACACAGGGGTTTATArC^ 
TITTTTGCCCCTTCTTTTCTC^^ 
TGTCAGAGGnXSGCTAACAACACCTCAACACATCyiTCCT 

30752 TCTCCAGGTCTCTTCCAGaWQTtaCCTTTTC^^ 

AGCCTCTtKMCCTTCCAACCAAATTTGACTGTCCACA 
CTATGGTOGTGGCromrKaGGATAQCTAAC^ 
GATTTCATCTCXATOUlTATTATCXnX3ACJ^^ 
XAGGGTTCTTACTTTGTtSPGACAGAAATTCMT^^ 
[T.G,C] 

AAAGAAACAAAAACTTCCaOUUVrrrGGCrCAl^ 
AAtflU*rC3WCTTCASACAC3^GQQQTT13M!AT^ 

TAGTGTCAGAGGTGGCTAACAACACCTaUiCACATCATC^^ 
AGAAAGGAATATTTATTTCTTTTCTTTGCCAG^ 

30849 ATCACAGGATTTGGAGCACTTTACrEArGGTCGTGGGTA^ 

CTAAGCCCTGICTAAGGCAATGTGATTTCATCTCCATCAATACT 
CCACaiCAGTCTOGTTGCSATTAtmiAGGGTTCma^ 
ATTAACCAGTGCAt3AATAAAAAA<3UUU5AAACAAAAACTTC^^ 
ATTTGGAAGTCAAAAAAGTGTAGTAAGTTTaW^TTC^ 
tA,T] 

TCTGGCrcreTSTCTCTSAAT^^ 

TTaW3AGGGATGCTAGCTTCACCrAGTGIOVGAGGTC3GCTA 
TCXrra\ACAAAaMUUWUlTACATAGAAAGGAATA3^ 
ACaiTAATTTCTATITOTTCCMKrTO^ 
ATATTCTTTAaXKXnMGTAGCAAAATTTGCTTC^^ 

30900 AAGAaAOCGClTUWKrCCTOTCTAAGQCAATGTGATTTCAa^^ 

otfKxaTTTCcacAOU if u xr i wir^^ ^ 

TOUVrTCACATTAACCAGTGCAGAATAAAAA^^ 
GCrCATSTAATTTGCSAAGTCAAAAAAGTGTAOTAAGTTTC^^ 
TATGATCmATCTGGCTCTGTqiXriXri^^ 
{G,AJ 

TTOGCTTCATTOUSAOGGATGCrrAGCTTa^^^ 

AACaU3^TCMCCnX3UVCAAAGAAAAAATACATAGAAA!^^ 

COWSAATTCaCATTAAITTClATTCTrTCCAGC^^ 

CTAACTCAAA3ATTCTTTATGCCX^^ 

TTrAAGTCnt3ATGC3TGAATAAGAAa3WSTOTAG^^ 

30904 GftGCGCTAAGCCCrOTCTAAGGCAATGTG^ 

CAj.x A^XACACAG'iVlW'intJQATTAgTTAG^ 

TTCaUMTAACGAGTOCAaAATAAAAAACAAAQtfUVAaUU^ 

ATCTAATTTGGAACnXIAAAAAACmTEAGTAACmTCAC^ 

ATGTCATCTGGCTCTGTCTCT C TOAATTTGa^^ 1 11 Cl 'CTATOTTQ 
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CTTOOTTCAGAGGGATGGTAGCrTCACCTAGTGTC^^ 

CATCATCCTCAACAAAGAAAAAATACATAGAATWC^^ 

AATTCaCATTAATTTCTATTGTltXaMX^TCTC 

CTCAAATATTCTTTATOCCITlTGTAGCAA^ 

ACTTCSrGATGGTGAATAAGAATAGTGTAGAGATAAA^ 

3 16 64 TGAGAAAATGCAAAAGGGCITTCTGAGAATGACrAAATCTATl^^ 
TTTATTTACATACAAGAAATTATAAAGAATAAGCT^^ 
AACTAGGAATAACCrrTTO^CACATAGGCAGGAATOGG^ 
TCCAGATGTCCCATGTGGTTTTQTTTTATCTTATAC^ 
TTAAGGriX 3 T A TTACCaATOimSAAAATATTACCTAT^ 

AGTGCTGCTCSrAAGCCnmCACCTCXXrrAGGTCTG^ 

AAGATAGGGCTTGCATrCTCrrGCTTCATA^ 

GC1TGGTG6GAAAAATCACTTTCAGGA(n*rT^^ 

ATGCGCTGTGAGTGGCAACAGAATCTGACACTTAtrAGA^^ 

GGCCTCrrCTTGGTQAGTOACCCACAGCT^ 

32014 GATTTTCTCTQAAGAXAOGOCTTGCA3pCT^^ 

AATCCCCTTTGGCTTGGTGGQAAAAATCACTTTCAGGAGT^^ 
TACCTGTCATAATGCGCrn?rGAGrrGGCAACAGAA^^ 

TAcrTGAACAOGGCcnxrrcvxtJGUXjAcnm 

TrAAIOTAlUCCTATCATTCrrAATCTGrrrAAGTAC^^ 

TTACrCACCAAGASAGGCTftlATrCAAGTC^^ 

ATTGAAATTCTTACAAAGTATATTCTCTGAT^ 

AGAAAGATAACAIAAAAATCXXXAAATGCrEACCA^^ 

GAATATCnXrGAAGAAATTTGTAAAAACAAATAGAAOT^ 

TATATGCXS^GArGCTGCXT^AAATAGTGTASAi^^ 

32197 TTQAACAOGGCCTCTCrrTGGTGAGTGACCCACAGGa^ 
AATTAACCTATOITTCTTAATCTGTTAAGTACATT^^ 
ACTCACCAAGAOAGGCTATTrnXZAAGTCr^ 
TGAAATTGTACAAAOTATATTCTCTGATCATAATGGAAT^ 
AAAGATAACATAAAAATCCXXAAATGCTTACCAATTAAAAAAC^^ 
[A,G] 

TATCrCGAAGTUU^TTTGrAAAAACAAATAGAACrA^ 

ATGCCAGATGCTGCTAAAAIAGTGIAGAAAGGGAAATT^ 

GSU^AGATATCAAATCAATAATTAAGTTCTCSk^^ 

ACCTAAAACAAACSIITUUSQAAGGAAAXAATAAGAA^ 

AAATAAACTATAGAAAATTGATAAATAAAAAOCTGATTATTTGA^ 

33074 TC3G6CTCACTGCAAACnXXXXXnXXXX3GGTTCACGCai 

GTAGCIOGGACTGCSVOGGGCCaVCCAGCATGCCCGGCTM^ 

GCCrrOQGCCTCXXAAAOTGCTGGGATTACAGGOGTaAG^^ 
T lflVriTA ATTGATGATATAGTAGGCgU^TAJatfVAT ^^ 
l-,T) 

ATAA!rATATATAAACCAATTOTATItJU^TAACAG^^ 

AlTrrem AGTTACACACrTAAATCTTCCXa^^ 

CTTCAGIAAATAAATCTTGGAAATOGTCTTCTTU^^ 

AATGTACTTAGrrrTTTTTTTAATTGATGTATA 

ATAATGTOTTQAAGTATAGXATATGTACACItn^ 

33505 AAATCTTGQAA Al XXyiXrrrC' rA AAGAAACTOG^ 

GlYlViU " rriTA ATTQATGTATAAAATT<y3lTGTACT^^ 
GAAGTATAGTATATGTACACTGlXSAGlt^ 
TACATAATTATG ArrriTOiX XXaUVGAACACTTAAm 
GAAXACXSATATATCAACAGTAGGCAACHZAGAAGCT 
IC,T,A1 

GGGAGATGCnXSGTCAACAAATTaVTATTTOCAGa^ 

TCaTCCATCATGGTQACTATAGCTGBlTGATATATCCT 

ATQTGTAAGAAATAATCACAAACftGTTAAAACAGGACrca^^ 

TTCATGAGTCAaACGTTCT^CACAGCOT 

CTGTAATCAAOGTGTCSUSCTGGGGTTGTGGCC^^ 
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33551 TTCAftATGTACTTAGTTTTTTTTTTAATTOATGT^^ 

CAACATAATGTGTTGAAGTATAGTATATGTACACTGTGA^ 

QAAOCGTCnrrATTTTACATJATTATaiTrmXnX^ 

QTAGCGTTTCTCAAGAATACGATATATCyUVCAGTAGGakAOTl^^ 

GJUSGGGAAGGAGTTAGGGAGATGCKSGTCAACAAATTCAT^ 

lA T] 

GTroUW3AGATCTCTCATCavrCMGGT^^ 

TTAGTTTTTTATAAATGTGTAACAAATAATCACaAACAG^ 

TTTATCrCACTGTTTTCATGAGTCMAO^ 

AGGGTCTCACCAAACTOTAATCSUUBGTOrCAGCr^ 

TTTGAAGGTCTCCTCAAGGTTTGCrGGCAGAATTCCTTTACT 

33801 AGTTAGOGAfiATGCTCSGTCAACAAATTCAT ATTTGCACyrTAC^^ 
ATClXnrCATCCATCATGGTaACTATRGC^^ 

TATAAATGTGTAACAAATAATCACAAACAGTTAAAACAGCACTCAT^ 

crit r ririri tam3AGTCAGACQTTCAGACA^ 

CXyVAACTGTAATCAAGGTGTCAGCTGGGGTTGTGGCC^^ 

CC,A,G,T1 > 

TCCTCaUWSGTTTGCrGGCAGAATTCCrTT^ 

TGCTTTAACTCTTTAGGAAAGTGTCTCAACTC^^ 

TCAGCTGATTAQGrK3W3GCCaUX^ 

TCATTAGMGTCTTAATCXXaTCTGXAAAATTCCCT^ 

AATCATGAGAATGGCATCCCTCATATTCACAGAT<X1^^ 

34648 TAmTGTATATTTCACATATATCTTATATATCTXaA;^^ 
AAAATAAAT(rrACATA0TATTATAGGC»TTTm^ 
CATGC»GA0TTTCTOG6AAC3Umnx;QAACCC^ 
GGCCTTCCTGAAATATATAAGCTGATTATTTTTAAGGTTAjC^^ 
TCCAAATGG CTLTlTi ' CVllX nOTGftGBU Unr^ ^ ^ 
tT, C/G3 

TTAaATCSKKX3«3AirrXAACCAaAAAa 

TTCTGTATTGACaUWrrTTACCATCSRAAAAAATATTAGGAJ^^ 
GCCAAAGATGCTOATTGTAAATACrrAGAATAACrrC^^ 
AATGATCTCCX3AGAAGCCAGAGTGAAAATCATAAGTGAC^^ 
GT G " rA TGGCAATGATATAAAA<Xnxy3AAlXJriX^^^ 

34754 OAAAACCATCnTkGGCATGCAGAGTTTCTGGGAACAATCTGQA^ 
TACAAAAGATAAAAGGCCTTCCrGAAATATATAAGCTGATTAT^ 
ACCAGGAAAAAOAATCCAAATGQCTTTCrTOCTTTGAGAAQTT^^ 
GGACAATAATTATCGTTAGATGTG(X3VQATTTAACCAGAAA1 
CTTATATTAACTTCJanrCTCTATTGAC^ 

IG,T] 

CTCaCTTCACTCXAGCC3UU«3AT^ 

AAGOGaAATCCa^TUATYSATCTCOSAGAAaCCAGAGI^^ 
GCAA(K!AACXACAGGTOTAT0GCAATGAI71TAAAA^ 
TGGAAGGAATTTATGATGCCTGCAGOGTAAGTTCSQA^ 
AAAAATTTGTATCTGGCnrEAGAATATATTA!^^ 

34 8 67 AOATTTTACCAGGAATUUUSAATCCSIAAIOGCTTTC^ 

TGTGATTGQACAATAATTATCGTTAQATGTGCCACSAT^ 
GAAACTGCrTATATTAACTTCATTCTOlATTGACAATTTT^ 
JUUUaTCTTCTCACTTaWCTCTAGCCAAAQATGCT^^ 
TTTTCCTTAAGGOGtfUirCCaU^AATOATCT^ 
CT,C1 

GATGTCTGCAAGCAACCACAGGTGTATGGCAATGATAtAAAACCIt^^ 

GGATATATGQAAGQIATTTATGATOCCrrGC^^ 

CTAACrCAAAAATTTGTATCTGGCTTAGAATATATTAT^^ 

ACAlMATATCSmiTCAGCTCAAAAAAaTTAC^^ 

TTTAAATGTTTIATTAAGAZAAATGAAGTAAaAaTTT^ 

35013 GTftTTGACAATTTTACCATQTUJU^AAATATrA^ 

AAGATQCTGATTGTAAATACrAGAATAACTCTATTrrTC^ 
A3CTCOGA6AAGCCftS3tfypGAAAA3qV^^ 
ATGOCAATGATAIAAAACXriXXiAATGTTC^^ 
CCrnX^AGGGTAAGTTQGAGQGATTTTTTTATATTi^ 

FIGURES 



29/30 



wo 02/26947 



PCT/USOl/29960 



CC,T] 

AGAATATATTATATGTTCTTTACATAAGGACAAAACATAGA^ 

AGTTAOUUiaWCAAATTTCACAGCACAAAATACTTT^^ 

AGTAAGAGTTTCTCrrGATGCIATCaAAOUUVC^^ 

aaa(3attaataaagcagtttattttctx:aagcggcto^ 

TAAACAGAGAAGTATAAAGTGATGTTATGAATAATATAATQAAAAGCAA^ 

35225 OCOOQATATATOGAAGGAATTTATGATGCCTCtaWSGGTA;^^ 

TTACrrAACTCAAAAA a ' XnOTA TCTGGCTTAGAATATATTAT^^ 
AAAACAXAGATATCATGTCASCrcAAAAAAGITAaiAAT^ 
ACTTTTAAATGTTTTATTAAGATAAATGAAGTAAGAGT^^ 
CAAAATTAGAATTTCTTAACCAiOAAATCCAAAGATTAATAAAGC^^ 
IC,A,G,T] 

GGCTCACATTCAAOAAASAAAATAATCATAAACIAGAGA^ 

AATATAATGAAAAGCAAATATTTTTCTTGAAQGAAACATT^ 

GATGAGACGTAAATAAGGCCTGIUVGAATAAATAACATCCAAT^ 

TGTTATAGAAAACSACAAAAAGCATAGCCAAAATTATGAAGGTG^ 

TCTGAGGGAACTCCy^AGTAATTGGTTGGGTCTCAGCAT^ 

35517 TTCTCAACX:66Cn*CACATTCAAGAAAai^AAATAATCATA 
(yiT A T G AATAATATAATaftAAAGCAAAT ArririTC ' A ^ ^ 
TATCTUGAGTUGAiXSAGACGrrAAATAiy^^ 
AGAAAATAAT6TTATAGAAAAGACAAAAA8CAXACK;C^^ 
CftATTCATATCnX5RGGGgACTCC3U^ 
[A,C,T,G3 

GAGAAACAAGTAGATAACCAT6AGUUU5GTGCSVTTAGGCX^ 
TCa^CAGTGCCCTCATCTGCCTTCTAACAT^^ 

COCTAGACACTTGCTTTTTAACATGA^^ 

GGAAATGATGCSAATGGTATTGATATCAGGTGGC^ 

ClCTACCACATGTCCGACCTCm^^ 

36885 TAAGAAAACTAG^lGaATAAGCrrCAGGAaATCC^^ 
C3lIAAAACX30GA(nYn7WCAATATAAAAAAAAA3^^ 
CATGCATCeilATTGAAGAAATUIGTCCAA^ 

ttcx:aagacataccatcataaaoggtc3U3a^ 
tttgaaggaagaaoiy^tcagaactacatagaactcot 

tC,G) 

gaiggtggaaaacacattcaaatttcaaagggaagattattt^^ 

AT6CnMCrCAAATATt:AACTtnX3A6GGT^C3GAAT^^ 
AAAAAATGTACrTCTGAXACCCTftCTTCTTAGGAAAt^^ 
AATGAOGGAATAAATCAAGAAAGTOOAAGACXnTUU^ACC^ 
AAAC3AGTGGTATC3VGAIAATCCC7ACI^CC^ 

38527 AAGIUU^6CTGTAT6CAGOTCKrAXA3GGATaAG7U^^^ 
AACAAAGTOATATTAAATTACTGGATCISUST^ 
GAATC3kCn7XAATOUUX:AAT7ATCCri^^ 
TAATGQATCTGGCnriXS^UUUUll^^ 
AA13IACCAGTaAGACTCAATACM*ATTTTTGAAG^^ 
[G.A] 

ttgtcaagggtcrccttttaacttsaoaaac^^ 
cttgtataattcxx:tacatttcictc5^^ 

TGAATCAGUmrrrCCAAACnACCTTl^^ 

CATTTGAGJUUU^ACCAATAAAAGAAACmiArATG^ 

AftGC!CAGRGATCTTCrrAA Crin ' in * n ' A gTT^^ 
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SEQX7ENCE LISTING 

<110> PE CORPORATION (NY) 

<120> ISOIiATED HOMAN PROTEASE PROTEINS, 

NUCLEIC ACID MOLECDLBS ENCODING HUMAN PROTEASE PROTEINS, AND 
USES THEREOF 

<130> CL000862PCT 

<140> TO BE ASSIGNED 
<141> 2001-09-27 

<140> 60/235,557 
<141> 2000-09-27 

<140> 09/734,675 
<141> 2000-12-13 

<160> 4 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 1225 
<212> DNA 
<213> Human 

<400> 1 

cgccct:tat:g ctigaagccat: 
ggcagtgacc atagg^ctcc 
bcatggctcc tttaaaattit: 
atatcaactt aaggacttac 
t:tcagcctgg aagaaaaatt: 
^ggtgtgaaa gtagatgtca 
aagagagaag aaaaticcaaa 
aataaat:gcc tcaticagttc 
ccaagcaagt tgtggtaaac 
catitgcaccc aaggcggcct 
gtgtggggcc accttgatta 
gtataaaaat: ccacafccaab 
gaaaagaaat: gtcagaaga^ 
cgacattgct g^tgt:gcagg 
ttgtttgcca gaagcctctg 
tggagcactt tactatggtg 
cataagtgac gatgtctgca 
gttctgtgcc ggatatatgg 
tttagtcaca agggatctga 
taactigtggt caaaaggaca 
gat^tgcttca aaaacaggca 

<210> 2 
<211> 405 
<212> PRT 
<213> Human 

<400> 2 

Met Leu Lys Pro Trp Met lie Ala Val Leu He Val Leu Ser Leu Thr 

15 10 15 

Val Val Ala Val Thr He Gly Leu Leu Val Hie Phe Leu Val Phe Asp 



ggatgattgc cgttctcatt gtgttgtccc tgacagtggt 60 
tggttcactt cctagtattt gaccaaELaaa aggagtacta 120 
tagatccaca aatcaatttc aatttcggac aaagcaacac 180 
gagagacgac cgaaaatttg gtggatgaga tatttataga 240 
atatcaagaa ccaagtagtc agactgactc cagaggaaga 300 
ttatggtgtt ccagttcccc tctactgaac aaagggcagt 360 
gcatcttaaa tcagaagata aggaatttaa gagccttgcc 420 
aagttaatgc aatgagctca tcaacagggg agttaactgt 480 
gagttgttcc attaaacgtc aacagaatag catctggagt 540 
ggccttggca agcttccctt cagtatgata acatccatca 600 
gtaacacatg gcttgtcact gcagcacact gcttccagaa 660 
ggactgttag ttttggaaca aaaatcaacc ctcccttaat 720 
ttattatcca tgagaagtac cgctctgcag caagagagta 780 
tctcttccag agtcaccttt tcggatgaca tacgccggat 840 
catccttcca accaaatttg actgtccaca tcacaggatt 900 
gggaatccca aaatgatctc cgagaagcca gagtgaaaat 960 
agcaaccaca ggtgtatggc aatgatataa aacctggaat 1020 
aaggaattta tgatgcctgc aggggtgatt ctgggggacc 1080 
aagatacgtg gtatctcatt ggaattgtaa gctggggaga 1140 
agcctggagt ctacacacaa gtgacttatt accgaaactg 1200 
tctaa 1225 
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<210> 3 
<211> 38844 
<212> DNA 
<213> Human 



<400> 3 

ttatattcat aaaagtaggc agtaagttga 
agctttaacc tgtggcttct gtagcttttg 
ctaaatgttt cctcaaaagg agaaacactc 
ccattttccc tcagatgctc acagcttctt 



agatttattc atataggatt tagtagctgc 60 
taatctggca gtgcgcatct gctatattat 1^0 
tiaacaactta tcaccctagt ctgctggcca 180 
ccgtgggatt tgaagatatg acttccatga 240 
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cacttgatca gtatgtcaat gggtattgaa 
ttcctttcag tgtgactatg tgtcttggtg 
tccatttatc ttactcagag gaactgtgct 
atttccttga gttttaactt ttctcctttg 
cacattgata agattctatg agaaaatgaa 
atgaat:agat taatacataa atgataaatt 
aggcaccgag aataaatttg tgtcctagaa 
agttctatta tactggagaa attacttaat 
aagtaattca atcacactaa acaatctitta 
aagccattta ggaggttaaa taatgtcatg 
caagcatgtt agcttgtttc t^atcacagga 
attattttta atcagtcggt gcacacatgg 
ccaaattcca tgaggattct tgattaattt 
aactatttta ggtccaaaca gttttaactt 
tgtaagaccg ggtgcagtgg ctcatgcctg 
gtggatcacc aggtcaggag atggagacca 
actaaaaata caaaaaatta gccgggcgtg 
ggaggctgag gcaggagaat ggcgtgaacc 
cacaccactg cactccagcc tggcagcctg 
aaaaaaaaaa aaaagaaaaa aactgrtttta 
ccaatcctgt gaagaaaata tgaaaaatat 
atatatacac ataaagagat aaactctgat 
caagttttag agaacaagca cgggagttag 
ctctcatcct gtgttatttc ctgtgtaatg 
tttatatgat tcctactctg tccaggtgcc 
aaggctaatg cagcccattg agaagctatc 
tgattcaatg tgagaagcac tgttgctgat 
ttttttcctc caaaattgtg gaaaaatttt 
cagcaccacc tttgaagtat tcttgccaaa 
agagctaaag aaaatctaaa ggtaatccaa 
agttaaataa taccacagga aagcattcag 
atgtttagct tgatctcttc aacagtcaat 
cttttagatt aaaagagatt aaaaaggcat 
tatgtcttgg cttttacaaa tcatgtgtaa 
atggactggg tattagatga tatggcagaa 
tatcatggtt atgttggata tatcctaatt 
gatgttttat ttcacattaa aatatagcag 
aattttcaac aattgatata ataatgtgat 
tagtaatttt ttatgtctga acattttcat 
aataaatgag ataatagatt taaaatcact 
agagataaca aagtgctgga gaaaggagga 
tggaccatgc tgctaagaga aaccattcct 
taaaatagaa agcaggagca acattaggat 
atcttctcag accaagatga cattgaacaa 
gctacagcaa cttgaacttg tctaaggaga 
gagtaaccag tgggcccttc cttttctcag 
tgctgaagcc atggatgatt gccgttctca 
ccataggtct cctggttcac ttcctagtat 
tttgatttta tttttctgca aagctccatt 
aaattgcaca tttaccttca aatttccaca 
caacaaacaa gtactaaatt cttattatat 
gctttaagca aagtagacag atttctaatt 
gagaatctac acaaaaagat caaaaattgt 
aaacaactca tcacttatta tatattaaaa 
ttggctcttt tactcatgaa ccatcatttt 
cacttgtcta ataaaataag gaatttcaaa 
attatcattt atttttagag aaaaaaaatt 
atatttttat caactaatat atttgtaatt 
ttacattcag tctctttctg gggagaatgc 
caatgtgact ctcacatgga tgtatgtgat 
cgctcttatc tgctgtcacc ttcacagagt 



ccactcttca gctctgatcc cacggttcag 300 
gtgggagatg tgattctttt atctactttc 360 
ctaataggga aatagattga aagcttataa 420 
gtcttttttt cttttcaaat gacttgaaga 480 
gagttgaaca aattgaatat gtatgagtga 54 0 
tattaaataa tttgaacgaa atcaatcgag 600 
gtaagaagac ctgagtttga gataactagt 660 
catcactgga cttcattttt ctcatatgga 720 
aggtctcctt cacttataaa tgtatgtttt 780 
tcccatggga cttctgtttg ttgttctatt. 840 
cctgctgcct ttccgcagcc agttctctag 900 
tcaatattta ctcaatagaa ttcaggtttc 960 
tattacttat gccaaaacta ttatcttctt 1020 
ttatcctggc atttatatat aaaaaacttt 1080 
taatcccagc actttgggag gccgaggtgg 1140 
tcctggctaa caeca tgaaa ccctgtttct 1200 
gtggtggacg cctttagtcc cagctattca 1260 
tgggaggcag agcttgcagt gagcagagat 1320 
gatgacacag cgagactccg tctcaaaaaa 1380 
tagtcaaaag aaaaactttc tataaatcaa 1440 
cctctgtttc caaaaaaatt taggctatca 1500 
aaattggata aataaaattc actataatag 1560 
tcgacctggg cccttaaaca gatatcctct 1620 
ttggtatcat tcctgcctga ctctcataga 1680 
ttattgggtc ttagcggtaa aaagatgaac 1740 
tgtaagtgaa catacatgca aactaatact 1800 
cataggtgcc agaagaacag caaagagtta 1860 
tatccccggt gtgatgcaat ataaaataca 1920 
tgaatttaac caaaatctaa tcaagacttc 1980 
tttataggaa atgagggata taaaagaaca 2040 
acaagtccag aaagtaagat attctaaagg 2100 
gtcattaaaa actaaaaaag aagcaggact 2160 
aacaaacaag tgcactgcat ggtcctcgat 2220 
ttataatgaa accatggagg gaacttgaag 2280 
atatcattaa ttttttagga gtgttaagag 2340 
gtctataata atgatttggt aaaaagtcac 2400 
cagaaaaaat aaatgagcca aatacagtaa 2460 
atatatatgg atgttcaatt atactattct 2520 
aatacttaaa aataaaagat aaaagataaa 2580 
ttgtaaactc taaaaggata gacagataaa 2640 
atggtccctt ttcaagcatg tatgccacct 2700 
gaccaccaca aagaggccac caaatgcctc 2760 
tcccagatcc tgatattttt tttttaacac 2820 
aattaaagac ctttttgcag ggaaaggtag 2880 
gctggaaaac ctgcaagcat tgctatctga 2940 
gacagtggga tttggcaccc gaagcagaaa 3000 
ttgtgttgtc cctgacagtg gtggcagtga 3O60 
ttggtaggta aaattaaaga tttcactcta 3120 
tacatatatg taaatgtaac ttcatctaaa 3180 
gagtatattt aactgtttca gtcatttcat 3240 
gtgagtactt ttctggatat tcaagataca 3300 
tccttagagc tctcaaccca gaattctttt 3360 
aattgtctga aacttactag taattataat 3420 
tgaaaagcta tgataaatta gttattaaaa 3480 
ctgtccaaca tttctaaggc aaaagaaaaa 3540 
atgat tgaaa acctatacgt atgacacaat 3600 
ttactctttc caaaacaata ttcagggatt 3660 
acacaaataa tgcacttcaa gattctcttt 3720 
aagccattta cattttttca caaatctcta 3780 
aaaacaaata actcaggctg ctcactttaa 3840 
caatggggga gcaaagactc tacttggagc 3900 
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cttaaagggc ttaagatcat agtcctaggc cttatatgat aaccccagct gtagtttata 3960 
ccattggcaa aagattctca ggtcacttta tttggttgca taaaagtctc tttacaatga 4020 
gagtaaggtt tgttaacagt atggattata tgggtaagta atcaggatgt ccaaaaatgt 4080 
attacaaggt ccagagattt cccacttaag acatatgcct tcctgatatc cctgtttctt 4140 
tccttggtitt gtagtctcga aacccactcc ctcttccctg agccaggctrt ct:caaggatt 4200 
gaggttgttt tgtatttttc ccattctcta tctttaactc tgtatctttc ttactccctc 4260 
tgggccttac tcctcagatt accaaattcc ttaggagtct caactgcttt cctttcttac 4320 
atttcctaat agatttatcc ctgtttcatg ctcgtcttgt cttcaatctc agacagctct 4380 
tctctacact ttcttttcag gtttttctta gtgtgcctgg ctctcttgtt aaaaatcaaa 4440 
attcacaagg acattcactt atctctactt ccactagagt gtatgatggt acacatttca 4500 
actcagcaag gagcaatgta gcaatgaaat gttcaagctc tacagctaga ctggatttaa 4560 
aacttggaca ggccacctac tagttacaga acaatttact taatgcctct gtgccttaat 4620 
tt:ccttat:ct gtaaaatgaa ggtgatacca atcttagaga gctggtigtgg ggattaaatg 4680 
ggctaataca taaaaagtgc acaggacagt gcctgccata ttgtagaaac tcaat:aaatg 474 0 
gcagctatta taattgatat aaaacattaa ctgttatttt: ttaaataaaa ctcaattatg 4800 
aagaggctca gggacatatt caagatttat attggcccca t^gtaattga gttctigaaat 4860 
Gtttgtccaa accatttagt ttcctatttt tcatttccat tgcagaccaa aaaaaggagt 4920 
actatcat:gg ct:cct:t:taaa attittagatc cacaaatcaa taacaatttc ggacaaagca 4980 
acacat:at:ca act:t:aaggac titacgagaga cgaccgaaaa tttggtgagt caggtaaact 5040 
tctttttatc atagaataat gcaagtggaa gggattttgt: ggatcatttc tccatttcta 5100 
aaaacatgat: t:t:t:cagaccg ccaacattag aaticatcttg cagattgcta ggccccatcc 5160 
cagacctgct taatcagagt atgatgagat: gggtaggtgg ggagaggaga gtaagggaat 5220 
ctgcatgt:ct: aacaaatggg tgattctaat: aagcctctct t:tctaact:ca gctaccttat 5280 
ttaaaggtiaa gagaat:tgag gccaagatat cctagcccgt ttcttcccca attccaccac 5340 
gtttcccctg tagaaaagcc taatcatacc aaaactagtt: tttataagtc cacacacttg 5400 
tttgtaagac cacattttaa gattttgagt atttticagaa tttacgttca tcttgtaagt 5460 
atattgabaa agacaaaaaa . ccagactt:at t:ttgt:agtaa tcaagtcaaa tgctaataat 5520 
tttgttaaag ct:aaagtgca agacbgctcc caaaaagaaa aaaagcacac tcagttgtat 5580 
aatcattcca ct:cagaat:gc ccatgaactc tcactcaaaa actaggtitca aattaatt:tt 5640 
tctaacaagg aagcacagaa gcagagactt attttaaaaa gaaagaaatg acaaatgtat 5700 
tggtttgttt taatcaaaga accabtttta agacactttc tittcccaaat catctaccat 5760 
tttttcctgt catcatttgc tctttgtcca tagtatacct aatggcatca t:attbacaat 5820 
aatattgtag agtttataat ctctattttc agttaacatt aaatcattca caatttctta 5880 
attttgtggt ttcatctttc ccaaccaaba attaatgtct acagattgat atagattct:g 5940 
cattctttca catgcagagc atcttataaa agagcatttg caatcagttc ttaagttatg 6000 
ctaggatgaa cggggagcct gcaccaatiac acccaaatac cttctctact cct:ccagt:cc 6060 
traagtgactc cacataacct cctcgatgca aaaagagaaa actcttaact tgccttagtt 6120 
aaaaagataa acacaccttt gaatgatgga aaatgttaca atttactggg aaattttgaa 6180 
atttgt:tt:ca tttatatttt atggccaaca ttactgctac tgttgttgtt gtaagttaac 6240 
taggcaattc tgtctttact gaagtaaacg gacaagaatg caataggtct taaaagaagt 6300 
gagagaaatg cagaggtgca tgttgaacag aaactctiabt taaaagtgga gttttaagtt 6360 
tcacctaagc atgtgttcct tcaaaggcta aggctaagt:t aagtaaggac acattatcat 6420 
catgggtacc bgcaaggccc ttctctggtt gtcattattt att:t:atcctc ctttatcacc 6480 
atagcatziag cccttaccct ccccccttgc aggaaatcat tctatgtttc atgtggtatt 6540 
cttttgtttg tattcattct tacaaaaata tgtitttgcta ttttgcgtac acttgctttt 6600 
aacttacatt ttgtgttata aatcactttt gtttcatctc tttttactga gaacttttta 6660 
aaagatatat gttactaaat atacctttag tttattgctg tt:agctgcta at:tcat:agtg 6720 
tgtatcttcc atatttacct gcctgtca^ ccaagaaatg ccacactiaaa cagactccta 6780 
cttaccccct: tatagaccta tgcaagtact t:ctggaagca gaattactag gtcatitgaat 6840 
gtacatatiac t:t:aacl:tgac caattggtgc aggtttgct:c tt:caaaatgg ctgactcagt 6900 
gtgcacgccc atctacaatg catgaggatt tctatgtccc cacatctaac caacacttag 6960 
tgtcttagta tgtttaggct actacaacaa aaaataccat aggcbgggta tcttaaacaa 7020 
caaacaattia tttctcatag ttctggaggc tgaagattcc aagatgaaga tgatcaaggc 7080 
tctagcagat gtctggtgag agcctgcttc ctggttcata gaataccatc ttgctgtgtc 7140 
cctcatggca gaagccataa gagaactttc ttttgtaagg acactaatga ctttcatgag 7200 
aactccaccc tcatgaccta actatcctcc aaaggcccca tctcctctat catcggtttg 7260 
ggagttaagg tctcaaaata taaatttcag gggaacacaa acattcagtc cacagcactt 7320 
gc^attattt ggctttctaa atttgccacc ctaatatgta taaagtagta ttttatttgt 7380 
gatttaattt gcatgtttct aattactaat gagtttgtgc attgttacgt ataattatta 7440 
actttttgga ctttcatttc tataaattgc ctgtacatat tatttgccta tttttctgtt 7500 
aaacttgctt tttcacctta tttgtattgc tttgcagaag ttctttacat tttctggata 7560 
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ttgatagtgt gttggttgtg gacactgcgc ttatccattc tgtcttctac taat:atggac 7620 
cgtgttgttc tttatgaaac cgaaatctgt aactgaagta atcatttttt cactgttttg 7S80 
ccttatgatt gtattttgaa gcttttcttt: aagaagtcct tcttcccttc taagacataa 7740 
aaatatttta ctatgttact tattaacctt atagttttat cttttacatt aggtctccaa 7800 
tacatgtgga atccaccttt ggatgtgtta ggtagattca gttttttaat tcatatagtg 7860 
agccagtttt tgaatiatiaac tagtztaaaat: atcttggctt ttcctaat:at atggtattat 7920 
tattgagttc attgcatgca tttcttggca cctgggtctt gcagaaaagg aaacatgaat 7980 
ctgtctcctc aaattgcttc caatcttttt ggaaagatgt: gagtaacaca catggaattg 8040 
aatatcatga catigatataa ttaagggcta aattacatg^ tgaggacagt aagtacagaa 8100 
aaacttcaaa accaaacaag ggttcccatg gtcagaaaag gctttatatt attttacctt 8160 
tgtttaaatg agacaggtgt: ttttctcctc ccatcccgca ccaggttiagc tttagaagaa 8220 
ttacaggaag agtttatgcc tcatcctgag ccacacctgt ttgttgttgc taaatcccaa 8280 
tgaatacaac cagattcttc tctctgtcct atat:gggtgc taattagaca accaaggaag 8340 
aacaggttgc acgtcctgtt cttcctcaca titgggcttta ctgatttgaa tgcaaattga 8400 
gatgcaaaag taaaaatgag ttcatattta gatattgcta tiaatccgccc ctgttccctg 8460 
agatagtgga gcagacatat: ctcatctctc atatcattct tcagagaagg gtccattaat 8520 
cagacattac tgatgtctga ttactgccgg ctggccatcc tgcaggtgga gaagcatggc 8580 
atccagcaga aactgacagc atgcactttg agggagggaa ggataagcca ggaatt:t:atg 8640 
ctgaataagc tgcctaagt:a tacatgttca ataagttcta ggggaagtca caaatactta 8700 
tgaaaggaga aacataacta tgtgcaattg agctttatgt ctcttcatgt gttgcatgtt 8760 
caaaaaatgg tggcattagc atgatccaag ggtggagttt tcagccattt gatgttcaaa 8620 
ggtgaagcag aggacacaaa acccttacta tigcat octet gtgagtcagc caaaaccagt 8880 
ctggactgct agctagatta acaaagaaaa aaagagaaag aagatacaaa taagcacgat 8940 
cagaaatgat agaggtaaca ttacaaccaa tcccacagaa atacaaaaga tcgtctgaga 9000 
ctcttatgaa cacttctatg tagataaact: agaaaatcta gaggaaatgg gtaaattcct 9060 
ggaaaaacac aatcttccaa gattgaatca gaaagaaatt gaaaccctga acagaccaat 9120 
attgagttca tacttaaatc agtaatttaa aaaacttacc agccaaaagg aaaaaaaaag 9180 
gcccaaacta gatggattca cagccaaatt ctaccagacg tacaagaaat agctaggacc 9240 
aattctagtg aaactattcc aaagaattga gaagagactt: cttcttaaat cattctatga 9300 
agtcagcatt accctaacgc caaaacctca caaagacaga atgaaaaaag aaaattacag 9360 
gccaatatcc ctgatgaaca tagatataaa aatcctcaac caaataccag caaaccaaat: 9420 
ccagcagcac atcaaaaagt taattttcca aaatcaagta ggctttattt ctgtgatgca 9480 
agactggttc aacatatgta aatcaataaa tgcgatttac cacataaacc gaattaaaaa 9540 
caaaaatcat acaattagcc aggcatggtg gctcacactt gtaatcccag cactttggga 9600 
gaccatggtg ggcaaattac ctgaggtcag aagttcgaga ccaacctggc caacatggtg 9660 
aaaccccatc tgtattaaaa atacgaaaat tagccgggca tggtggcagg tgcctgtaat 9720 
cccagctact: cggagggctg aggcaggaga atcacttgaa cccaggaggc agaggttgca 9780 
gtgagccgag atcgtgccat tgcactccag cctgggtgac agagcaaaaa tccatctcaa 9840 
aaaaattaaa aatttaagaa aattaaaatc atacaatcat ctcaatatat: gtagaaaaag 9900 
cttttgataa aattaaacat ccctticatiaa tiaaaaacact tagactaggc atrcgaagaaa 9960 
catacttcaa aataataaga gccatctgtg acaaacccac agccatcatc acactgaatg 10020 
ggcaaaagct ggaggcacta tccttaagaa cagggaaaaa gacaagaatg ttcactctca 10080 
ctactcctat: tcaacatagt acrtagaagtt ctagaaagag caatcgagca ggagaaagaa 10140 
ggaaaatgca tccaaatacg aaaagaggaa gtcaaattat ctctctttac tgacaatatg 10200 
attatatgcc tagaaaaccc taaagacttt acaaaaagtt trccaaaactg ataaacaact 10260 
tcacrtaaagt: ttcaggatac aaaatcaatg tacaaaattc agtagcattt ctaaacaata 10320 
atgtccaagc tgageuiccaa atcaagaaca caatcccatt: ttcaatagcg acacacacac 10380 
acaaatgaaa tacctaggaa tacatctaac caaggaggta aaagatctct ataaggagaa 10440 
taaaaaaaca ctattgaaag aaatcggaga tgacacaaat gaatgcaaaa acattccatg 10500 
ctcatggatt ggaagaatca atattgttaa aatgtccctia ctgcccagag caatctacag 10560 
attcaatgct attcctatca aactaccaac ataattttcc acacaaagtt agaaaaagct 10620 
tttgtaaatt tcatatggta caaaaaaaaa aagccccaat agccaaagga ctcctaataa 10680 
aaaagaacag agccagaggc ctcacattat ctgacttcaa actatacttt aaggctacag 10740 
taatcciaaac agaatggcat tggtcaaaaa cagacatata aaccaataga acagaataga 10800 
gaacccagaa ataaagccac acatctacag ccatcagata ttcaataaaa ttaacaaaaa 10860 
taagcaatgg ggagagaact ttctattcaa taaatgcftgc tggaatagct agctagtcag 10920 
aagcagaaaa atgaaattgg actcctatca ctaaatacaa aaactaactc aagatgcagt 10980 
aaagaattaa atgtaagacc acaaacaatt aatacaagaa ccctagaaga aaacctagga 11040 
aatactgttg tagacatcag tcttggcaca gaatttagga ctaagtcctc aaaagcaact lllOO 
gcaacaaaaa caaaaattga taagttggac ctaattaaac taaagaactt ctgcacaata 11160 
aaagaaacta tcaacagagt aaacaaacaa cctacagact: gggagaaaat atttgcaaac 11220 



5 



wo 02/26947 



PCT/USOl/29960 



tatgcatctg aaaaggtcta atgtccagaa tctigtaaaga acttaaacaa ctcaacaagc 112 BO 
aaaagaaacc aagtaacgcc attaaaaagt aggcaaagaa catgaacaga tgcttcacaa 11340 
aagaagacat acaacgcagt: caagaaacatz atgaacaaat: gctccacatc actaattatc 11400 
caagtaatgc aaat:caaaac tiacagtgaga taatatctca taccagttiac aatggctatt 11460 
attaaagatit aaaaaaataa catgctgatg agactgcgga ggaaagagaa tgcttaaata 11520 
ctgttggaaa cgtaaatggg titcagccact: gtggaaagca gtttggagac ttctcaaagt 11580 
acttaaaatg gaactactat: tcaacctagc aatcctac^t actgggtgta tacccaaagg 11640 
agtataaact tittttcccag aaagacagct gcactctcac attaattacc acagtattca 11700 
caatagcaaa gatgtiggaat: caacctagat atccatcaat ggtggattgg acaaagaaac 11760 
tgtgagatat atatgt:atat: atatctatat ataccatgga atactatgta gccataaaaa 11820 
aggatgaaat catgtccttt gcagcaacat: ggatgtaaca ccacaaggaa ggcactttta 11880 
tctcctcttt acaggtaaga gaaccaagct: tctgaaatta aggtccatag ctggaaaatg 11940 
atggagggga gatttgaagt: catctaggca actccacaca tgtgctctitit ccactaaatt 12000 
gttctactgt caggaaggga ctcagctaag acagaagata aaattattiaa aatctiaaatc 12060 
aattcttctc tcat:t:t:catk ttrttaaatcc atgaagatta taaatcctct atgctgt:gct 12120 
agctaacttt *t:t:ct:t:gacag at:acat:tagg tatacttatit agagaaaaat attctct:ttc 12180 
tcatttccct: gtatcagttt: trtggtgagga aggcaaaggt aggaggaact gtaatagaga 12240 
aagatgaagg aagct:gatgg atatattgac atgt:gtatgt acatctagt:g tgaacaatct 12300 
atagttggaa gaaaggtgtzg gatgggt;atg ctttttgagg gaagtttttg agaaaagaag 12360 
taatatgaac t:attt:ctaaa tttcctgata aagttgtaaa t:acagcat:ag t:ctt:cacagg 12420 
agaatctatt tiagttitatca tcatcattca gcaaatacag catgatgtta ggcactataa 12480 
aaggctaaga aaaatgattc ticrkctctctc ataaactaat ccaatttaga gattitiagaag 12540 
acaacaaatc tiggagaggac atigaaccttc taaataatga cctt:cccttg ctttgggtat 12600 
cctggtttta aatat:tt:tta gtacagcttt aaatagatcc aaatgagat:a t:t:ttcct:ctt 12660 
t:tacaaaagc aabtcaaaga t:ctaggtttt tgttgtacac tgagaattaa tacttttttc 12720 
ttt:aaaatcc tt:aattgcaa at^ctttaaat tctataaata t.^ttgcct:tg tgatctcaga 12780 
aatataiagcc aat:t:tgggat atggatatct aa'tatattgc tract t:gtt:ac acgtgagtag 12840 
tgacagatgt ctrgtccattt ctttctgaca ttccacaaag aaacactgaa gaaggaccag 12900 
tgcaatcaaa gaaatgactg at:ggcat:cac aaaatatcac atcccatttg atgatctgat 12960 
tacctttttg tttagggtga tcagaaagtc acagtttcat ggcaccctcc acacccacac 13020 
accttgtatg acact:ggatc caactgcttt ctccaataga cacagcactt aaagatgtgg 13080 
cagttaggct ^gaccccaag aaggccaaaa agccttctgt: gagcatcact cagt:gct:cag 13140 
gttgactaag ct:ct:at:ccag gctitgagaga atggttcata gctgacttct tggatccaaa 13200 
aaaaaaaaaa aaacacctag agttttatac agatiatgata cgaacttiaaa aggactgcac 13260 
taaaaactac caagattatg attctitiattit: titggagagtia aagaaaatiag gctgcctttig 13320 
gagaggggtg caacagtittc tgat:cct:ct:t acaaactgct: t:gctgcccat cagtgggtag 13380 
gaggtcttag t:gagaaccta cctgcatigct: catcctgagg taggcactgt gaaggcgtta 13440 
acaggctctg aagctacatg gccctiggttt: cagt:gaact:c tigtggtgtica acttgggcaa 13500 
gtcactitcct cttctatgaa acgtgaataa tcatagtiact cacctbagag ggctgatttig 13560 
aaagcaaatg agctcaaaca caatgacatc trgtgcttggt: gcatatatgg cagacaacag 13620 
tgatticccac trabtataatt: attacagtct taccaaggag gagctttcca caaataatca 136B0 
attacctaaa atgtccaaaa acaggaaaaa aaaatictctt ccgataat:t:c at gtgL aatt 13740 
titctttitttc tctaggagca ttgatctcaa cctgatgtaa agcaagcact ttaaaaagtc 13800 
ttataaaatt t:t:cctggtaa atigcaaaact: ttctgataaa taaatitctica cct:tt:t:tat:c 13860 
aatttgttaa t:t:caacaaaa atatacbaca taccaacagc abgcaaagca ctatgctaga 13920 
titttatagac tiatgaaaaga tiaaattgcca tctctatgca taaagggttt gccatt.taat 13980 
aaaagagact atatatttgc ataaatatat agtgaatata ttgcataaat atataatata 14040 
tgtttacatt aaagaataaa aggtataaga gggataagaa aaattgagac agagggaaga 14100 
caggtcagtt: tgagattaac gaatatcccc aaagaaggta ttatctgaga ttggccttga 14160 
aggatagttg t:gatt:cagga acacagaact tgcagaatga gaaggttgtt acagaccaaa 14220 
ggaacagcct gagaggcgtg agtatgcagg aaaatgaggg ccatgcctga aagtactggt 14280 
ggtgttgaag atggagccag gcaagtitggt cacagaggga gaggaccttg aatgtctaac 14340 
attgtggaca gaggctcaaa ggctcaaatt ccctattttt accttgagtt caatccttgt 14400 
ggcaatgaaa cctcagtgaa gctttattta aggctaaaag tgtctttt:aa aaatccctct 14460 
tatataatat cctttgcatg ttactcttgt tgtaattiagg agaaagcaat aggatctaaa 14520 
gttttttttc acagcatggt tttggtttct ttaattctaa ggagctcacc tggtgtitacg 14580 
ttggaaaaaa cagcttttat: atctcattta tattccatat gccagtctgc agtgacatat 14640 
ctatctgagg tttacagtgt: tagccacaaa acactcccta agtgaataca ttgactgctg 14700 
taaggggagc cagtcaggaa gcacctgcag agaaaagcag gcaacatgta taaacagagt 14760 
taattcagga atgaaagctg aatggctggg cgagtctgtt tgtttgagtt gacagcctct 14820 
ccctcactct titcatitaaat: atccaactaa ccttcaattg ccctcttgga acttaatctc 14880 
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agtgtaattt ccagcatgtc aaaattatca 
tcttttgttc aatgctagga gacaaactcc 
gagagatagg atagatttac aaattgctaa 
ttataacaac ctggcacaca gctttaaata 
ggaatgccag aacgttggca aaaagaatgc 
gaaacaattt cagggtcttt agaaagctag 
cttatgttca tgttttttct gattttcctc 
tittaccagtc cctctgcccc atttctcaaa 
aagatt.ctaa ggcctcctcc gatgtagtaa 
attttataaa gaaggatcct ttttattatt 
cttctcatgt tatttttaca gctcctatct 
attggtgctt ggtttagaca ttcatagcag 
agtcaaaaac ttcatccaat gccttcacca 
ggttgaactt attcagaggg taattacaaa 
tgcttttttc cttctaaatt gtatcacttc 
ttttttggaa tccctcacct ccatactgag 
aaattgttat aacaaagtca ccctttcaaa 
gcagataaac cccacaccac ct.cagctaaa 
gacatatttt: ggatiactagc aatittatttt 
gctatt-acca aatctatatc tct:acaagt:t 
acttattact atgtgttcta caaaagaaac 
gtggttgtgt gattgagtgg gaagaggcgg 
cccagctitac tagtatctgc gtgatgccag 
atttictiaaca ggaatgaaga tacttctticg 
caaataatgt ggttcctgat: gaggtattta 
acttgtttict tgatccctiga aacagct:tt:c 
tttcagtcat gcaagttggt ttttcttctc 
cacatttggt tcttctgtcc aacatgaact 
atttctacca agtgagtgct aggggatact 
agcatcacaa tacc£ft:ggca gtagattaaa 
tgcttctcct accttggtga cat:aaact:ga 
acctaattct tttttcctcc ctgatttacc 
attttggctc tctcttiaaat ccctaatgcc 
cagagctiaag acctgtaatg ccaggatgga 
aaacttaccc aacatt:tt:at atctgtttaa 
atctttgttt tacaggt:tag aaaacCgagg 
tgcccagttt gggaaaagta gtatacgctc 
ticaccactca aagcctctct gaataticcta 
ct:a^ggt:gtt ttragcaaga^ a1:c1:l:ctact 
c^ttgggbagg ttttttaaaa atatgagtgc 
aattcttgga gaggatccc^ ggcatccata 
gcactgt:gaa agc^gagatc caccaattitia 
t:tgattgctt taacatittiaa tcaaggatat 
tagtttctct cttgttgtgt aacaggtgga 
aaattat:atc aagaaccaag t:agtcagact 
cacaagactg gagggaaaag gaacaaagga 
atggacjtagg cttttggcta gaatttttca 
at:agggctga ccaccgtt:t^ gtcaacaaaa 
gttcaaagtt caccaactga cagtttccca 
aacttattigt gaggcctgga acctaccaga 
gtcgttgcat gcaccaagtt atattatgtt 
aggcaaactt ggctatiaaaa tgggttcaca 
aaggcatigcc taaacaaaaa gat:att:cctg 
agggggaaga cbcatatcag t:t:gca9atat 
aaaactacat agcageiacac gcatgtrcata 
atttcttaaa gataaagaaa aat:aat:atat 
tatccctttc tcaccaat.ag act.aal;:aatg 
atgtcttgac tgagtaacta gtgacttccg 
cctacaatac ttaggaggga aaaagcatat 
tctgttttca aattctcctt: taccatatta 
tcctttggat ggtagccatc actatataat 



agcagaaaga gatactaccc tgaaagaggg 14940 
aactacaaaa ttctagaaat gccctaaaga 15000 
tgctattagg ttgtatagat aacaatagat X5060 
tataagtttc tctgaaactt ctgggaactt 15120 
ttctaataat gaaagccatc atctgccatg 15180 
tttatacata agctccattic tacaataaaa 15240 
ctgctgtaaa ttcattttat cagaattctt 15300 
gcgtt-gtccl: cagacliacct gtiatcaccta 15360 
atgagacttt tctagagaga gagtcctaga 15420 
gtgatcacca aagttactkc tgcctagatt 15460 
tcccagacaa cctaacaat:t caaagataaa 15540 
gcacggtgcc agattgatga tgtcatccag 15600 
aaaagtt:aca aatggccagg aat.caaatgti 15660 
acaaacttct ttaaataccc aactgctatt 15720 
tctccctgtt ccattttgtt tgccttttta 15780 
tagtagagct ggctgtgggt gatgagagag 15840 
aacatgtctt ccaaaagaat: ttt:gl:tt:cta 15900 
tggggctt^tc ttitatittiaag taccaataaa 15960 
ccaaatgcta tctttgatct taagtittaag 16020 
ttatactt:ta ggtcaataaa ttacttgata 16080 
cgaagtaaaa tttacaticac atttaacagg 16140 
accctiacaga tagaagacCt: gggttticagt: 16200 
ggaaattcac ataatgcctic tgagtcacag 16260 
cagaattgtc at:t:agagt:t:a aagaagataa 16320 
tgaattcctg agcatgctaa ggaagt:tata 16380 
cctatatttg tgtgtgtgtg tgtgtgtgtg 16440 
attccttgag aatttaggat attttgtgcg 16500 
gtagtacctt acccacattg agatgacact 16560 
gcaagccgaa tgccaggt:gt gagagaccac 16620 
gctgtgcata tggactaaaa gcagtggctt 16680 
gtaacaaatit tgacct-aata ctiggaatacc 16740 
ct:agagt:pca caattigacaa taatttaaaa 16800 
tcctccttiac acct:t:acaag caaagacctg 16860 
ggctagagga ccatcagcaa ttaactiacca 16920 
ccttcatagc cttatgagtia gcagatcaat 16980 
ctcaaattiga ttcagtaact ttrgccaagat 17040 
aaatccagga ctgaggcagg gt:t:ttctttg 17100 
tctctgctct gtatctctct gctactcctt 17160 
ccagaaacct. actct:agcac agtagaatta 17220 
ctaggtcccc tctagaccaa tcgaaaccaa 17280 
aatt:tttt:t:a attcatcaaa tgattctgtt 17340 
aataatgatg ttagttictcrt gaaaaaattt 17400 
attcctatta taaaatat:at^ tat^aacaca 17460 
tgagat:at:tt atagatiticag cct;ggaagata 17520 
gacgtatgta tgtttgggca aaggtggaat 17560 
gacagggact ctcatgtat't gtiatgtctcc 17640 
taaacattac ctctaaagca gtcttgaagt 17700 
agactaagat ^caggaaggg taagaaatat 17760 
aagtgacaga accaggaat:c aaaccccatt 17820 
acccatgacg tggggaaaac ccagcagctt 17880 
gacaattata ttatttcaac cacgttaagc 17940 
aattttacct gtaat:gtaac cgaatgacat 1800.0 
ttgtaaCaaa titittctttict gticatggtgg 18060 
tgctcagaag tttcaattgt gttattttga 18120 
tacacaaatc catgagcctg tatgactcat 18180 
tcagattttg atttatttga agaaaataat 18240 
ctt:t:gtt.ggc aggtgtactic aaagt.tctct 18300 
taaggatttt ataacataaa ttgggtaatt 183 60 
aaatgctaga actttctaga tttcatgttt 18420 
ttgtiagcaac attattatac tcctgtgaac 184 80 
acctggtaaa aatgttaatt cctcagattt 18540 
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aagaagtaaa attagt.catc tgtttgccaa tttgacataa aattctagtt atttagatct 18600 
ttatattcca gagcctaaat gaacaaaaat acataaattg tctcagaatt tccttttagc 18660 
caaaagattc agggagatgg gcctctagag tttttcacag tttttttttt ttttgtaaaa 18720 
aaaaaaaaaa aaaaaaaaag gagagataac agatcaatat atattagtitt caagcfttttt 18780 
tgtttttttt tttaaacaaa aacctgtaat tgcttttcct attttaacag tatttaaaag 18840 
tttagttcct caggtaacag aacttgaacc tgtttatatg atcaaagtitc aagaaatbgg 18900 
gcatgtttaa tttggagaag actcggggac cacaatattg ttgtcttcaa atatttgggc 18960 
^agaggagga aattatttta tgtatgttcc aactggtaga cctaagcctt atggaatggg 19020 
agatataggg agacatiattt caactcaaaa tgatgaactc ttaaaagcag agctgaccaa 19080 
agagaaacaa gcctctttag aaaattaaac ttactatctt tttaattact gcactgtcat 19140 
tagagggcca attgtcatgg accctgtaga agtgattcag gtatcaaata tacaattgat 19200 
tagcctaaga aaacatigaag gcttcttcta actctcagag ctrtgtaattt: tgatgatgat 192 60 
tttttatatc tgtcattcct agctgctgta acaatccttc aaattaatgg gggaaatgca 19320 
ctgaaaacat aatgaaagct agaagaggga acatat:gaaa tgaccttggg tcagaatgac 19380 
atgagaggat cagcacttga cactctcagc aactgaggga tcattcaggg gaggaagata 19440 
caggtaagac tgaaggacaa ttccaggtgt attctttgaa aatgtacctt tcttttgtgt 19500 
gtcacagtcc agaggaagat ggtgtgaaag tagatgtcat tatggtgttc cagttcccct 19560 
ctactgaaca aagggcagta agagagaaga aaatccaaag catcttaaat: cagaagat:aa 1962 0 
ggaatttaag agccttgcca atiaaatgcct: catcagttca agttaatggt: aaggaggticc 1968 O 
ccttctatgt gatatgaagt tgtctattag gtccatgttt tgacgaatct caaatttatt 19740 
tgtcattatt tccatttcaa atiaatagcba gaabtcagat gaaaaaaCtc aagttaaaga 19600 
tgtgacattt caaggtytat tagt:ctctaa cgtaagcatg tctgaagt:ta gtcatccagt: 19860 
ggttttcccg acagtaattg at:tggcactc atcccaaaat ataggcaagc atttacaact 1992 0 
aacagagagt taatcccacc caggcactgc ctccatgact aagcaagtga aaatactagg 19980 
ggtttagcaa taattgtttt tctgggtggg accttcctaa aacacaaatt: catgtgttgc 20040 
catactttta ttgatagttt ct:atatatgg tgatiatacaa tttttgttag ctttttttcc 20100 
tatgggcatt tgggaaaatg gcaagccaac tttgaagttg ttiagagtcat: tttaccatta 20160 
atgctttaaa aatcacagtc taggaaaaca tcactgaaac tatgtgtaca ttgtticcact 20220 
tttctctttt tttttgttca cccttagccc atitataccat tatcacttcc ctcaattaag 20280 
gagaacaaac cttbatcaag gtctatctct atggccttta ccttaagtaa ctaatttictt 20340 
tttatattcc agtgacgtac gcaaattcac ctttatagaa gtgaaattca cacaaaaaga 20400 
gttgaggaat tcagtaatta aaaggagcta agaatcaaat ttaaatctct aatttcttaa 20460 
aaggctccaa ttaaaaaagg tttctatagt caaacacatc ttaaaaat:tc tggctttgat 20520 
actcgtttc^ tggaaattct tcct:taeagt: gtcatattaa aaattctaag gcagccagcti 20580 
agagagaaac ttgtttaccc tcgtccgcta agctgtttgc acagcatctt cttccaacag 20640 
acaagtatag atttct:cct:a caaatttcaa tggataccag acctaagbgt tacagaagag 20700 
attcagggca agcgattttt atcagacatg aaacaggaca ctctgccctt gtaagggtct 20760 
agctgacact tcaagaggaa accagataag gaagtaaaaa atgtgaggta atggaatggg 20820 
cagatgtttg ctgatgtgag aacgagtcag ctacttaggg aataaagctg aggacctctc 20880 
ccagccagaa gggaggciacc tgacaagtgc ttaatccatc ttctttgtta gatggggaag 20940 
caaatgaata gaagttgtga aacaatgggc attctgataa tttacatgat gctttctgtg 21000 
taatttccaa taaatagtta atttgtcagg aatgtaaeiag cctgaactat ctgaaaccag 21060 
agtaaagcat aaattgttca ttggctgcct ggtctttttg ttttttgtag gctcagcttc 21120 
taaacttcag cttattttaa taattgtact aaattaaatg gtaggatatg ctaatggaga 21180 
acctgatttg agagtcacct gaggctgggc atggtggctc aagcctataa ttccagcact 21240 
ttgggaggcc gaggcgggtg gatcacctga ggtcaggagt tcaagaccag cctggccaat 21300 
atggtgaaac cccgtctctt ctaaaaatac aaaatattag tcaggcctgg tgacgggcac 21360 
ctgtaatccc agctacttgg gagactgagg gggaagaatc acttgaaccc gggaggcgga 21420 
99ttgcagtg agccaagatc gcgccactgc actcaagcct gggcttgaca gagcaagact 21460 
ccatctccaa aaaaataaaa aataaaagag ttacctgacc aattctaact ccactaagtc 21540 
accacaggac cacccaaata attggctcat gcctttgtct tcattttctc atctgtaaaa 21600 
ttccaatggt aatgtttgtt cttcctgaaa tcacagagag attataacga tatacaagga 21660 
aatagaaaac acaatgtgaa ataaagaggc tgttactaat gagaaaacta ttatgttgtg 21720 
catatgcttt ggaaacctga aatcattaat ttgagtgatt gactagtagc agaaagatag 21780 
atccttgaaa gtttcagaat gttcaatgta gaaagaacag tgtttgttag tgatatggga 21840 
gcctaggggg tgttgctttt ctggccagaa acctctgtgg ccagtggttg gtgcctttgc 21900 
ccaagttttg ctctggccca ctgggcttgt tctgcccact tgacctggca gactgtgccc 21960 
accttccgct accagcctgg atcccatgcc caccaaggcc aacccaggca tggagctgtg 22020 
agggttgtct gagcgagcac agggtctggc cactgcccac agccaggcac actggctgca 22060 
gcatgacggg cagctccagg cactggcaca ggtgtgctgt ctctctgtga ggctgtggct 22140 
ggacaaagct cactgcaagc agcttccctg gcaggcacct gggaatgtgg tggcacccag 22200 
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gaagcttgga gatgccagga actgcagggt cccaaagagg gagtcacaac cctggcttgg 22260 
ggagctccca ggtctgggat ccctaaaggg ctgcagcttt tctctctttt tacccacaat 22320 
gtggccagca aggggtatgt ttcattcct:g tttgtgttac agctctttta gtcttgctat 22380 
ttggcaggtc ctgagtitctt gtccbgagac caagaagaat gaggtatgca gacaagtgga 22440 
gggtgagcaa ' gacgaagaaa ggtttactga gcaagagaac agctcacagg agacccacag 22500 
^999cagctc ctcttcatag ccagggtgtc ccaacaagtg ticcagctcct agcaaagagg 22560 
aggccctgga ggtagaagct cctctctgca ggcaggttgt cctgttgagt gttcagcttt 22620 
cagcacacag taggcagtag gccctagagt: ggtctatctc ctctictgcag gcaggtagtc 22680 
ccatggtctc ccagbcacct ctccatctgc aagggtccaa tgctgcctcc agcacctctc 2274 0 
tgcccacccc tccgtgcctg accaagct:gc tcccccacca gtgggcaact cagcccagcc 22 BOO 
ccattgtggt agctcccagg gtggcaggct ctggggggct cccagggatg ggctccaagg 22860 
actgtccacc ttc^ccccac gcccticcctg cagliggccat ggtcaagaat ggcaatgtgg 22920 
ggccaggttc cggagcagga gaggctccag gcctgggagc aggtcctgcc tggtcacgtg 22980 
a99ttggggg tggcacagtc ggctigcctica gggatgtggg acacagggga cccaccacca 23040 
tcactgctac tcccgcatcc gctcctgctia ccactgctcc agacagcctg tagctigccat 23100 
cactagcact taagaaaggc acatt:cagt:g gacagctcag gaaaatcttt acgtcaa^tt 23160 
tttataggca aaaacattgt ttcctgggca aacaaaattt atggactacc aataaataga 23220 
aaactgtaga gather agat taagtctaga aataatcctg tagcccaaga tttatttata 23280 
attcgtcaag aatct:gt:at:t ttgttttgac aaaaaaaaaa ctgtgtggtg tgggtccttc 23340 
aggagacaca gtgtgacaaa gcaaagctaa aatcaacttc tt^tgcattgc aaacaccaag 23400 
gctgtagtca agcagctcac tgcctatgtg tcagatgact ttgcttcatt: tttcatcatg 23460 
at:actt:gt:ag tctatagagc cctgaatatit aact:agcttt ctcccaactc agaaccgt:gt 23520 
taggaggtgg ttgctttcaa aactaaagtg ttaatsrttta tttccattitc tiataccagga 23580 
aagtaaaaat ctttggtcaa aattagaaat ctti^aacaac tagttact:tg tgtatitgaca 23640 
gtttgtttcc aggtgtaatc attctccctt aaaatccggt: tatattcacg accattatac 23700 
ttatcctggt atcattcctg gaaat;ggct:a act:tgcat:cc tigctcagact aagttigacaa 23760 
agt:t:tcaat:t gaagaattcrt aact:t;t.at:gc tatttitccac t:t:tattgcat tacaaaggac 23820 
aaaatatata gttttcttaa aaatgaaa^a aat:t:tact:gc cttaaactac atttgacggt 23880 
aaactgagtt: cctbccatag aataaccac^ aacagcaatc gatggtcctg agcaattgac 23940 
tcttcaccat acaatgattt gggatgcctt taagggtata ttitgaattga atattttcaa 24000 
aagctcccac tttgtagagt titatcatcac tagtttcccc agtggaattt gtagaaagtt 24060 
agtagaatga aacaatctta ttttgt;ataa tgaggaat;ag aatactgaga atgtgtctga 24120 
gaaacatggc actggtiagga aaaagtiaaac agtittatitct catctgc^ca at;aagctaag 24180 
t;cattt:t.aac ttgaaaatca tzcaaaatttit catigaaacct: tccaccaacti Ct:at:t:t:ttcc 24240 
ccagctititag tiaagatataa titgacaaatia aaaattigtiat: actigtatiaca acatgatgct 24300 • 
ttgatacatg tatacaagtt t:aaat:at:ttg tgettcct:ta gtcaaactcc t:cact:ttt:tt 24360 
ggaagttgac agaatt:1:aat cttggattgt: gtccaataac tagcttttac cactatticag 24420 
tatattttgg ataagaaaca cataacagt:t tiattctttiaa aaaagcaatt t:tact:att:ta 24480 • 
ggaactgtgt ttaaaaagca t:t:t:taaatat: ca^ttatgca agagt:tttca aggt:t:tt:t:tc 24540 
attctaaacc ctttaaccaa aaaaaaaaaa aaaaagat:tt at:gt:gaaatt cgaag^aaat 24600 
agaagagatc aaagcagat:c t:gt:t:ctggct gaggcbgagt tt:gagacctg t:aagacagtc 24660 
tacttgccat atggcttggc t gLgL cccca cccaaatctc at:ctcgaatt gtagccccca 24720 
taattcccac atgttgtgag agggacctgg tgggagataa attaaatcat gggtgcagtt 24780 
tcccccatac tgttctatgg tagtgaatga gatctgatgg Utttataaga ggcttcccct 24840 
ttcacttiggc tcacattctc t:gact:t:gct:t .gccaccatigt: aagacatgcc tittitgccttc 24900 
ctccatgatt gtgaggcctc cccagccaca tggaactctg agtccattiaa acctcttttt 24960 
ctttatiaaat t:acccagtct caga^atgtic tttatcagca gtgtgaaaac aaactaatat 25020 
aacctgtittc ctctigtccca ttta^ccatc ttctgaagtg gaaegcaaag aagct:ttacc 25080 
ccgaactgct ggaaaaccat agttctctat taatacaaac tatttgtggg ctttagtcat 25140 
ccactatttg tgccttactc acccattgct tgt:gatagta tccacctaat tagaggctgc 25200 
ct:at:aagtct ctacaaaaac tgtacacaga tgttgttata tcagatagcc attctcctaa 25260 
ttaatctata tgttcaactg tctagaatcc atatatggtc agtatcctict gattattcct 25320 
ggtcattgag accaaccagg aaaatatcaa att:atcacta tttgttttat cttctttttc 25380 
agcaatgagc tcat:caacag gggagttaac tgtccaagca agtaagtcaa gttagcttat 25440 
ataaacaagt tcaattttca caticagaaag gacatttt:ca aaeatttgct catacttgcc 25500 
catctgtcct ccagattttc tttgagagat aataactatt tgtacgatag atttaaatac 25560 
attt:tt:tttc taactcatgg actgatcttt tagtcatgtt: caagaaaaaa attgccatgg 25620 
taaccttctg gggcaatttg aagaaagcat ttatttttga ttgggaatat tggacttgtt 25680 
tttctaattt: ttaaaaatgc cataaaatgt actittctgct acaaaataaa ataa^aagaa 25740 
agtaaticaat aggaaggaca taaaacccat tgt:ctgtgac tgacaatttg tctrgtgaaat 25800 
atgctaaggt caggagttcg agaccagcct gaccaacatg gagaagaaaa cccatctcta 25860 
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ttaaaaatac aaaaattagc caggtgcggt ggcaggtgcc tgtagtccca gctacttggg 25920 
aggctgaggc aggagaatca cttgaacctg ggaggcagag gttgcagtiga gccaagattg 25980 
caccactigca ctccagcctc agcgacagag tgagactcca tctcaaaaaa gaagaaaaaa 26040 
atatgcttaa tagattcatc t:taatcgct:a acagtggctt: cattaaatca ctitcaaatca 26100 
ctgtggccta aattttgaaa gattttacaa aaaacagtga tgaatttgag caatgatgtt 26160 
catgcatttg cctctgtgac ttgcaaacac cctaagt:att tttatccatg tgtttattca 26220 
titcaacaata tcttttaaca ^ctaccaagt gccagaaatt agaccaggag ttggtggtac 26280 
cattgtgaat aaaacatgat ccctgctcca aaattagaat tccaaagtag agaaagatat 26340 
aaataaatca ggaagtatga aaataatgtg attaatigcta tgacagagga agtgcatagt 26400 
gctatgagag ttgat:cagag agtcagctaa cctgttctca cacagtaaga aagtgaaccc 26460 
tgaaatgtga gagagaagag gccatgaatc cagtgacagg tggggtaagt gtcctgggca 26520 
ggaggagtag tabacgaaaa t:gtctt:cagg caagtaagaa t:ggggt:catt tcctgtaatt 26580 
acaagat.gtt tcttataact taatgatct:c atcttttttc aggttgtggt aaacgagttg 26640 
ttccat:t.aaa cgtcaacaga atagcatctg gagtcattgc acccaaggcg gcctggcctt 26700 
ggcaagcttc cctt:cagt:at gataacatcc atcagtgtgg ggccacciitg attagtaaca 26760 
cat:ggcttgt cactgcagca cactgcttcc agaagtaagt tattgacctt aagttagaac 26820 
ccacttctgc taaaaagccc tgagtitittgt: catattcttg gtaacaatta atgtctcaaa 26880 
t:att:act:gaa gtaaaataag aaaaagttiat t:tcaggtect t:t:tct:aaaat aatgtitiacac 26940 
ttgcatactt aatcagaaat: t:tgatgggaa taagtaacag tcattatcct agtatccat:c 27000 
aatcattticc tcaaagtett taataaggaa actgtgtaaa gaaatcagaa ctattttgtg 27060 
acat:cct:aac acaaaatatt cactaataac atgtaccatt aatcttttgt caaacaatgc 27120 
t:ctccact:1:a aaactag bgt. ct.gtttctgc caaacactitg ggccagtctc atactgatct 27180 
tiaaataatca aactaattcc aaagtaaaat: ggaaattttc aataaatgcc ggaagttggt 27240 
aaccgt:gat:g atggagaact: gcagatcaaa tttagagcat tgacatatga agatctgtgg 27300 
aatcagaaca gtttiacaacc aaaabgagag attgctagca tgataaagac aggcacttica 27360 
aaagaga title ctcggagtiat: caaaggatitic atagaggccc ttigggccact caatgtgacc 27420 
titcccatiaat; agagcatcfcc ttcacaatiag tigacacaaaa gacaaagctig aagtgaagaa 27480 
tagcaaatitig tgctiatccta tiaattgtttc tgaatgcata cattttatita aatat^atgat 27540 
tiaaatgactt tbtiataact^ fcbaat:ct:tac ttttcaagat aataaccagt catttittatc 27600 
ac^attiacat tbagaatitt^ agatittgtitit ctaagtiagat taactgtatc gcctititcbtc 27660 
titicat^gcca atitatitacag tiaatiaacaaa gacttcttiga gtatctctat atiaataggbg 27720 
gcagcaggat: t:t:agt:gggaa aaabatgticc caggcagtibg gagagcbggg caaatitattg 27780 
aaccttagtg batt:aggt:aa tiagatiaggcti agatctititibc acat:tctit:tit t:gacct:at:aa 27840 
aattctiaacti tttgttacta tiaatiaaatitit catttgccta ggagcataaa tctittiatiaga 27900 
gacticbtiaat: atiticcaaaga atatacatab taagaatctia ggctitggcati ggtiggctcat 27960 
gcctgtiaabc ccagcatitt^ gggaggccga ggcaagagga ccacttgagc tcaggagttc 28020 
aagaccagct bgggcaagat. agtigaaaccc cattigggcat ggtggtgcat acctaticabc 28080 
ccagctactt gggaggctaa cgcaggagga tccctt:aagc ccaggagttt: gaggcbccbg 28140 
caagctatiga ttgcaccact: gcacticcagc ctgagtigaca atgcaagacc ccatictbaaa 28200 
aaaatagtiaa tiatabtibtita aaaataatcti acataaatbc btaatgbbtg aaagatgtiga 28260 
gagctcagta agctgatata bbagaaagcc agaaatccct tiatgctggtig tcbggtitititt 28320 
caaagtiaatg ggaaactitac tbtigccaaag t:tiagccat:t:t ttgtiggtaga Hagttctatt 28380 
tittgcaaata tcttbatagc attgaacacc aaatctatac tctattaact tct:accatca 28440 
atatttigttit ttcttitbaab ctggaacaac aggaaccaat tttiatttctit catitcatata 28500 
acagctattc tt:tagt:t:bcb ctttttcaga ccaaacatiaa aatgagggag aatatccaaa 28560 
ccataagtga aaataaabat: cat:t:act:gtg agctttiagtit tigctiaaggat aatgaccticc 28620 
agccctatcc atigticcctgc aaagggcatg attttgttct t:t:t:tat:ggcti gcatagcatt 28680 
cccatiggtgt: at:gt.atiacca cattitticttt atccagtctia tcactaatigg gcatttaggt 28740 
tgatitctiatg tictttigcbab accgaagagt gcbagaggga gaggaticagg aaaaataact 28800 
aatgggtact aggcttiaata cctgggtgat gaaataatat gtacaacaaa accccatgac 28860 
acaagtttac ctgt.gtaaca aacctgcaca trgtaaccctg aactttgaaa aaagtatata 28920 
t:atgcacaca cabat.atatg catiacatata tiatgtgtgta tatatatgca tatatgtgtg 28980 
tgtgtatiata taaaaaaaaa bat:at:atata tatatatata bataattacc ticatttttcc 29040 
agaaccaact tccagatigcc ctaccacatt ggttctitatt ctctgaacat tcgagacttt 29100 
gticagbgtct ticctitiaaaab atgcttccaa taactaaata caccaagaca gatgtgtgac 29160 
tagtigtcaca catiaacaaaa taaagcagga agtct:t:ctga aaaatacaaa taatgtaaat 29220 
tggtgggaga cagtgtttita fcaaagggaag agcagagaga ggcaggcaga tiatgtgatgt 29280 
gaatcaaatia gtttaaccta ticcaggcttt: attttccttia agtataaaac acagtcttta 29340 
ctagatgatic tttcatbgct: actaaatgat tttitccgatt cctigtatgt:a ccataatcca 29400 
GccatUgccc aagcccacaa gctagaagtc aaccgcattt accacatttg atcatctctc 29460 
aaaggactat gcagticatct aatiagacttit accacatcca titctitigacct fccaagaatct 29520 
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actccccaga aagaacaaac atgtttttta aaaatgtaaa tgagactaca ttattctctg 29560 
gcttaattat ccagtagatt cccatatcac ttcaataaaa tttaagcact ttatcatgac 29640 
ctataaaaca ctctaaaatc tagtccctgc ttacctctcc aagctcaccc ccaaccattc 29700 
tttcccttgt gttctgactg cagcccatcc aacccaagac cttgggattt ttgcctggaa 29760 
acttgtttcc ctcatctcct cacactgacc ctcttttact atgtcttagc ccaaatgcgt 29820 
tatcaaaata atcataatga cctgttagta ctctattccg ttaccctatt ttattttgtt 29880 
catagccttt atcaatgttt aagattattt atctatttgt ttgcttgctt tgatcctttt 29940 
ccttctctgg aatcttatac tcctgtgagc aggcacctta ggtcctgttc atcactttat 30000 
ccccagcagt: tcagataagg ctcagcacac agatgctcag taaatattitg tggaagggat 30060 
aaatgaatga tattttatgt gtattacagt tctaaaattc aatagttttg tattaaatat 30120 
cagttctaat atggcattta tatgatttta tctttcaaaa cattagcaat agattatatt 30180 
taaatgataa aagaaaact:a taactigcagc caagtattct caggattgta tttctcttat 30240 
attagcctaa atgcaattaa tctagctcat atactttggg cagcttatat atattctgtt 30300 
aatttctaac cttttccagg tataaaaatc cacatcaatg gactgttagt tttggaacaa 30360 
aaatcaaccc tcccttaatg aiaaagaaatg tcagaagatt tattatccat gagaagtacc 30420 
gctctgcagc aagagagtac gacattgctg ttgtgcaggt ctcttccaga gtcacctttt 30480 
cggatgacat acgccagatt tgtttgccag aagcctctgc atccttccaa ccaaatttga 30540 
ctgtccacat cacaggattt ggagcacttt actatggtgg tgggtatctc aggatagcta 30600 
acagagcgct aagccct:gt:c taaggcaatg tgatttcatc tccatcaata ttaticctgac 30660 
agccatttcc acacagtctg gttggattag • ttagggttct tactttgtgt gacagaaatt 30720 
caattcacat taaccagtgc agaataaaaa acaaagaaac aaaaacttcc acaaatttgg 30780 
ctcatgtaat titggaagtca aaaaagtgta gtaagtttca cttcagacac aggggtttat 30840 
atgatgtcat ctggctctgt gtctctgaat ttgaattttt tgccccttct tttctctatg 30900 
ttggcttcat tcagagggat gctagcttca cctagtgtca gaggtggcta acaacacctc 30960 
aacacat-cat cctcaacaaa gaaaaaatac atagaaagga atat:t:tat:t:t: ct:tt:tctt:tg 31020 
ccagaattca cattaat:tt:c tattgttcca gctgtgtcta ggaggac^ca gattgagtgg 31080 
ctiaacticaaa t:at:tctt:tat gcctatgtag caaaatttgc ttcagtactg aagaagctiaa 31140 
tttaagtgtg atggtgaata agaatag t gL agagataaat: tgtcaaacta tttgtcccct 31200 
ctaaaagtat tcaacttgat aCactaactit: agtct:t:gt:aa gaaataatga tgat:tt.agtt 31260 
actgaatgtt ctaggcaatic ttagtigagac acgctictgga ttctaacatg tggtccaggt 31320 
acatatgtat: aacaaagct:a gaaagtttct titaacactgg gcttgagaaa atigcaaaagg 31380 
gctttctgag aatgacbaaa t:ctat:tt:gca ggatt:ctat:a caat:t:t:aet:t: acatacaaga 31440 
aattat:aaag aataagcttt tgattctcag tctaccatta aggaactagg aataaccttt 31500 
cactcacata ggcaggaatc ggttttaggg tctctagatt ttttccagat gtcccatgtg 31560 
gttttgtttt atcttiataca gagtgagaca tgcattgctt tctttaaggt tgtattacca 31620 
atcacagaaa atabtaccta tggtititattia attctiagtag atccagtgct gctgtaagcc 31680 
tgacacctcc ctaggtctgc actctcttgg atggattttc tctgaagatia gggctt:gcat 31740 
tictctgcttc atagtggtgg gaaagacatic acaaaticccc ttt:ggct:t:gg tgggaaaaat 31800 
cactttcagg agtttgagac tggcacagaa acatacctgt cataatgcgc tgtigagtggc 31860 
aacagaatct: gacact:tafca gagcacticca ccctacttga acacggccl:c tct:tggtgag 31920 
tgacccacag gtgcttttaa tctattaaat agat:t:aaatt aacctatcat tcttaatctg 31980 
titaagtacat taatagatta aaagcagcca ttcgttactc accaagagag gct:atat:tca 32040 
agtctgtaaa gcaaacctta agaagtttt:^ ^aaaattgaa attgtacaaa gtatattctc 32100 
tgatcataat ggaatc^aac ^agacatcag ^aacagaaag ataacataaa aaticcccaaa 32160 
tgcttaccaa ttaaaaaaca t:atgt:aaata aagagaatat ctcgaagaaa tttgtaaaaa 32220 
caaatagaac taaatgaaaa caaaaatata tiaaatatatg ccagatgctg ctaaaatagt 32280 
gtagaaaggg aaatttatiag aaaat:gcata t:tataaggaa agabatcaaa tcaataatta 32340 
agttctcact tcaagaaact agaaaaataa aaaaeaaacc taaaacaaac ataaggaagg 32400 
aaataataag aataagaat:a gaaatgaata aaattaaaaa taaactatag aaaattgaba 32460 
aataaaaagc tgattatttig ataeiaatcaa tattttgcta gaaatgtcat taagcatttt 32520 
tacagaagat gagatatagc ticagggatgt: ccagaattt:a tgggctabgc ttttcatgac 32580 
ttggaataca ttttaccaac cagtttagtt tgctgaagaa gttgtggatt tgcactgtca 32640 
cctacttaca atacttagat tigtcagtttc accttactct tctcaccatt aCtttatttt 32700 
tatttttatt tttattttta ttttgaaaca gagtctcgct ctgtctccca ggctggagtg 32760 
cagtggcgtg atctcggctic actgcaaact ccgcctcceg ggttcacgcc atbctcctgc 32820 
ctcagcctcc cgagtagctg ggact:gcagg cgcccaccac catgcccggc taattgtttt 32860 
gtagttttag taaagaaggg gttticaccgt gttagccagg atggttttga tcticctgacc 32940 
tcgtgatcca cctgcctcgg cctcccaaag tgctgggatt acaggcgtiga gccaccgcgc 33000 
gccaggccat gaatgttttt aattgatgat atagtaggca atataaatgt gtgtgtgtgt 33060 
gtgtgtgtgt gtgtataata tatat:aaacc aattgtattc aaataacaga ataatttgaa 33120 
aaatctctta gcatatttct gagttacaca cttaaatctt ccgagcactt ttaaatatgt 33180 
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gtttacaaac atttcttcag aaataaatct tggaaatcgt cttctaaaga aactggtgta 33240 
ttagggtttt ttcaaatgta cttagttttt tttttaattg atgtataaaa ttgcatgtac 33300 
ttaccatgtg caacataatg tgttgaagta tagtatatgt acactgtgag tgttaaatct 33360 
agttaactaa gaagcgtctt attttacata attatcattt ttgtggcaag aacacttaat 33420 
atctactctt gtagcgtttc tcaagaatac gatatatcaa cagtiaggcaa ccagaagctg 33480 
ggggtcttta caggggaagg agttagggag atigctggtca acaaattcat atttgcagtt 33540 
aggaagaaaa agttcaagag atctctcatc catcatggtg actatagctg atgatatatc 33600 
gtattcttgt attagttttt tataaatgtg taacaaataa tcacaaacag ttaaaacagc 33660 
actcatttat ttttatctca ctigttttcat gagtcagacg ttcagacaca gcttagttga 33720 
gtcctcttct cagggtctca ccaaactgt:a atcaaggtgt cagctggggt tgtggccaca 33780 
tctgtggctc ctttgaaggt ctcctcaagg tttgctggca gaattccttt actcgcagct 33840 
gtagaatgca tgccagcttg ct:gctt:taac tctttaggaa agtgtctcaa cliccagcaag 33900 
gctcgccctt tttgaaatgg ctcagctgat taggtcaggc ccacctttga taatctcctt 33960 
ttgatgaatt caaagtcaaa ctcattagag gticttaatcg catctgtaaa attccctcat 34020 
cttggccata taacataacc tiaatcatgag aatggcatcc ctcatattca cagatcctgc 34080 
ccatatttgg gaggagggga atcacacagg aatcttgggg acta^cc^ag aat:tctgcca 34140 
accat:ggggt catggt^tcc caatcaatat atggtt:tggt: atiaaagaatc cctgaatgct 34200 
tgtgctatbc t:tagttt:t:ct acgtagcctg ccataataat ggtttctiaaa acticagaacc 34260 
^agct-tiacag ^ctgcagcca ccaact:t:g1:a atacat:t:gga agt:gaaat.ca t.t:gccgtt.t:a 34320 
atgcatttat atat:atatga tgtataatat atgtatattt cacatatatc ttatatiatgt 343 80 
gaaagctcat cataaacttt aaataataaa atiaaatgtiac atagtattat aggcatttta 34440 
tcaagccaat ggagaaaacc atictaggcat gcagagtttc tgggaacaat ctggaaccca 34500 
caaataaaag ctt:t:acaaaa gataaaaggc ctitcctgaaa tatataagct gattattttt 34560 
aaggttagat trttaccagga aaaagaatcc aaatggcttt cttgctttga gaagttttta 34620 
t:aaaaatgtg attggacaat aattaeccftt agatgtgcca gatttaacca gaaatt:cttt 34680 
tttctagaaa ctgcttatat taacttcatt ctgtattgac aattttacca tgaaaaaaat 34740 
attaggaaag tcttctcact tcactetagc caaagatgct gatitgtaaat acbagaataa 34800 
ctctattttt ccttaagggg aatcccaaaa tgatctccga gaagccagag tgaaaatcat 34860 
aagtgacgat gtctgcaagc aaccacagg^ gtatggcaat gatiataaaac ctggaatgtt 34920 
ctgtgccgga tatatggaag gaatttatga tgcctgcagg gtaagttgga gggatttttt 34980 
t:atatt:act:a actcaaaaat t:^gt:atc^gg cttagaatat attatatgtzt cbtitacataa 35040 
ggacaaaaca tagataticat gtcagctcaa aaaagttaca aatgcaaabt tcacagcaca 35100 
aaatacttt:t: aaatg^ttta t:t:aagat:aaa tgaagtaaga gttt:ct:ct;ga tgctiatcaaa 35160 
caaacaaaat tagaatttct taaccagaaa tccaaagatt aataaagcag tt:t:att:ttct 35220 
caagcggctc acattcaaga aagaaaataa tcataaacag agaagtataa agtgatgtit^a 35280 . 
tgaataatat: aatgaaaagc aaatat^ttt ct:tgaaggaa acatttttgg aacaagtatc 35340 
agagagatga gacgtaaata aggcctrgaag aataaataac atccaatttc agaataagaa 35400 
aat:aatgtt;a tagaaaagac aaaaagcata gccaaaat:ta tgaaggtgtig aaattacaat 35460 
tcatatictga gggaactcca agtaat:'bggt tgggtctcag ca^gaggagg atgagaagag 35520 
aaiacaagtag at:aaccat:ga gaaggtiggat: t:aggccat:gt: tigtgattcca t^gggccctcc 35580 
ccagtgccct catctgcctt ctaacatgga tgrtttt:ccag cgaaggtacg tttctt^cctg 35640 
gagacactt:g cttbttiaaca tgagat:actt: tagaactcta aggaggccac t:ct.augt.gga 35700 
aatgatggaa tggt:att:gat atcaggliggc agaaagticct gtccagagtc ccacaaactg 35760 
tiaccacatgt; gcgacctcta ticagaaaagg agcagggacc tatgtgacat agaggctggg 35820 
caaaagcagg atctiggtcca cagccagcct cggttgctaa t:aatgtggag ggaggcaggc 35880 
agaattitagg gattccaaca aaaggtccat accacgggga acaggtggaa ggtgcaggag 35940 
t:ctt:ggagca gacaggaccg gggeiatiticag gt;gaaccatg acattactga aaagccttag 36000 
9&999attgg tggbcataga gatgctbcac tggattgggg agcagaggt:a aacttgctgc 36060 
ctaacl^gtgc aaagt^aagbg atiaaaacaag gctttiagtica l^agaaaaata cagtaagt:t:a 36120 
t:cagggcagc ggttcaggta caaggatcca agacaggaat acagtgattg taattggggc 36180 
acatggtgag gggcctagtc tgatacaaca gaagtgcaag caccaccaac acctcgtctt 36240 
tctccataag tctttctctc cagagccctc atgacctaat cacctcttct taagtcccat 36300 
ctctcaacac tattgtattg gagat:taagt tbccccaacc tatgaactct tgggctcaca 36360 
titcaaaccat agcaccaccc agcacaaaag cacagagctt ccaatctggt tt;ctagctcc 36420 
ataccctaga accaaacagt: aagaatcacc tictggaaatg tagcaataat ataatcataa 36480 
t:t:tit.taaaat ccagtggaag gat^ggaaga t:aaaat.caag gaaatctctc agaaagaaca 36540 
acaacaacaa aaaagacaca gaggagaaaa ataatcagaa aaattaagaa aactagagga 36600 
baagct:cagg agatccaaca ccaaatgaat aggagctctg aaaacataaa acgcgagtgt: 36660 
acaatataaa aaaaaataaa gaatgct:cct agttctgaag ctbacatgca tcctiatitgaa 36720 
gaaaagcptcc aagtagtgct gggcacaata aatgaagtac ttctttccaa gacataccat 36780 
cacaaagggt cagaagccag ggataaggag aacaatctta aaactttgaa ggaagaacca 36640 
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tcagaactac atagaactcc tcaacagtaa ctctagaagg tagacgatgg tggaaaacac 36900 
attcaaattt caaagggaag attatttcaa cctagattcc tacccatgct aactaaatat 36960 
caactgtgag ggtggaatta agaagtttag acaagcaatg actgaaaaaa atgtacttct 37020 
gataccctac ttcttaggaa actacttgag agggtacctc agcaaaatga gggaataaat 37080 
caagaaagtg gaagacgtaa gacctgaaac tgttagtcca acactaaaga gtggtatcag 37140 
ataatcccaa caccatagct ctgcaccagg cttaaagtaa ccagctcgaa tttgagcaga 37200 
agtaagaaaa gattgtgtgt atgtgtatgt gtatgtgtgt atgtgtgtgt gtgtgtgtgt 37260 
gtgtgttgat atggtggaac agcttcagag gaagtiaaaag aactaacaag ctatctgatg 37320 
tcctbgaaca ttagtaaaca ttattgtgag gtgttggtag atcttttgga gcattcagca 37380 
tttaccaggt acatagaaaa ctatccacat gaaaaaaaga gttgtgttat taattctagg 37440 
aaagcaaaaa aagatttctg taatccaaat atgttacttg actcttcaat taataaaatt 37500 
tacacactgg tactaaatgt aggctgttaa tttaaccaaa aatagagatg ctataatgta 37560 
aagatgtggt gtggaaaagt tgcaaagaag ttgtaaaaca actaaatccc taactacgta 37620 
agagaaaata aatatttact gtctaaacct agaagctgta atttgagcat attatctagt 37680 
gataaggagt tagatactat aagaaatcat taaacaagca tgaagtggct acctcttgga 37740 
gaacagcttg cgtgaggtaa catgggacat: aactgctttt caagcctctt catgtttttt 37800 
cgtttttgcc ttttttaact aagtgctgtt tactctaaca aaataaattt tattttttaa 37860 
atgtgaaagt tgaaccttaa ggctctttgt aatattaaaa tccatgtctc aattaattat 37920 
tctgtgttga tagtctatac atgtactgtc tagtaacaaa atatgtgatt: catcaaaata 37980 
tcttaaataa tgagctttat gtttagctaa ttttctttct tttttcttat gtttttattt 38040 
ttagggtgat tctgggggac ctttagtcac aagggatctg aaagatacgt ggtatctcat 38100 
tggaattgta agctggggag ataactgtgg ticaaaaggac aagcctggag tctacacaca 38160 
agtgacttat taccgciaact ggatitigcttc aaaaacaggc atctaattca cgataaaagt 38220 
taaacaaaga aagctgtatg caggtcatat atgcatgaga attcaactat ttagtgggtg 38280 
tagtacaaca aagtgatatt aaattiactgg atctagtaac atgaaacaca caacgtaagt 38340 
tatttagaat cactttaatc aaccaataat ccttagccaa tttataaggg acttttattt 38400 
gtaaagtaat ggatctggct tgaaaaatac ggtagagata cttagctctt tiaaaticacga 38460 
atgttgaagt accagtgaga ctcaatacat atttttgaag atagtccatg ggatttttag 38520 
aatgtcgttg tcaagggtct ccttttaact gagaaacttt ttgaactcac aaagtgttca 38580 
agaaaccctt gtataattcc ctacatttct ctcgagctca caaatacttt tttttctttt 38640 
tccttattca atcagatttt ccaaagtiacc tttccaccat aagaaatgaa ttttctactt 38700 
ctacacccat ttgagagaca ccaataaaag aaagtcatat gtaggaaaca aagtctgata 38760 
gtaaaacaag ccagagatct tctaactttt tttagttata aaacctctaa tttttggtga 38820 
cttttctaca cacacacaca cata 38844 
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Iiys lie Lys Lys lie Asn Lys Thr Glu Thr Asp Ser Tyr Leu Asn Hie 
145 150 155 160 

Cys Cys Gly Thr Arg Arg Ser Lys Tlir Leu Gly Gin Ser Leu Arg He 

165 170 175 

Val Gly Gly Thr Glu Val Glu Glu Gly Glu Trp Pro Trp Gin Ala Ser 

180 185 190 

Leu Gin Trp Asp Gly Ser His Arg Cys Gly Ala Thr Leu He Asn Ala 

195 200 205 

Thr Trp Leu Val Ser Ala Ala His Cys Phe Thr Thr- Tyr Lys Asn Pro 

210 215 220 

Ala Arg Tip Thr Ala Ser Phe Gly Val Thr He Lys Pxro Ser Lys Met 
225 230 235 240 

Lys Arg Gly Leu Arg Arg He He Val His Glu Lys Tyr Lys His Pro 

245 250 255 

Ser His Asp Tyr Asp He Ser Leu Ala Glu Leu Ser Ser Pro Val Pro 

260 265 270 

Tyr Thr Asn Ala Val His Arg Val Cys Leu Pro Asp Ala Ser Tyr Glu 

275 280 285 

Phe Gin Pro Gly Asp Val Met Phe Val Thr Gly Phe Gly Ala Leu Lys 

290 295 300 

Asn Asp Gly Tyr Ser Gin Asn His Leu Arg Gin Ala Gin Val Thr Leu 
305 310 r, 315 320 

He Asp Ala Thr Thr Cys As^Glu Pro Gin Ala Tyr Asn Asp Ala He 

325 330 335 

Thr Pro Arg Met Leu Cys Ala Gly Ser Leu Glu Gly Lys Thr Asp Ala 

340 345 350 

Cys Gin Gly Asp Ser Gly Gly Pro Leu Val Ser Ser Asp Ala Arg Asp 

355 360 365 

He Trp Tyr Leu Ala Gly He Val Ser Trp Gly Asp Glu Cys Ala Lys 

370 375 380 

Pro Asn Lys Pro Gly Val Tyr Thr Arg Val Thr Ala lieu Arg Asp Trp 
385 390 395 400 

He Thr Ser Lys Thr Gly He 

405 
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